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TEACHER VARIABLES OF WARMTH, DEMAND 
AND UTILIZATION OF INTRINSIC MOTIVA- 
TION RELATED TO PUPILS’ SCIENCE IN- 
TERESTS: A STUDY ILLUSTRATING SEVERAL 
POTENTIALS OF VARIANCE-COVARIANCE* 


HORACE B. REED, Jr. 
Skidmore College 
Saratoga Springs, New York 


Purpose 


THE PURPOSE of this article is to illustrate 
the variety of meaningful questions that can be 
asked concerning the data of a single research, 
and that can be dealt with largely through the tools 
of analysis of variance and covariance. 

The research falls within the category of 
teacher competence studies, and is designed with 


the purpose of identifying selected teacher behav- 
iors that relate to desirable pupil learning. 


The Variables and Rationale 





What is Competence ?—The phrase ‘‘teacher 
competence” suggests the question, ‘‘Competent 
to do what?’’ The teacher’s role as mediator of 
learning has been selected here to identify com- 
petence. This role has historical precedence, 
and most educational researchers now agree that 
pupil gains in learning are the most valid criter- 
ia of a teacher’s competence (4, 5, 9, 28). 

The Criterion Variable.—Selection of desir- 
able pupil changes as thecriteriainteacher com- 
petence research poses many difficulties. Pres- 
ent school objectives involve changes in pupils 
such as increased emotional maturity, ability to 
solve complex and realistic life problems, in- 
creased sensitivity to the welfare of others, en- 
joyment of intellectual and artistic pursuits, etc. 
Educators frankly have no adequate measures for 
such objectives. More limited objectives which 
may or may not lead to the ‘‘ultimate’’ school 
purposes are being measured, but not with univer- 
sal confidence. At this level of abstraction would 
be found school goals such as pupils’ accomplish- 
ment on subject matter tests, attitude tests, in- 
terest tests, art appreciation tests, comprehen- 











*All footnotes will be found at end of article. 





sion tests, etc. A compromise with the direct 
measurement of learning has been suggested by 
Cogan (10). He has used pupils’ performance of 
required school work and of self-initiated work as 
criteria, describing these as being proximate to 
what is ordinarily referred to as pupil change. 
Cogan’s intervening criteria are an ingenious meth- 
od of avoiding the dilemma of over-particularized 
measures (such as rate of eyelid conditioning) and 
over-generalized measures (such as asking pupils 
‘*From which teacher did you gain the most?’*). 

In the present research, the criterion variable 
is interest inscience, as measured by the author’s 
Science Interest Inventory, a 70-item scale of the 
pupil’s voluntary activities in science during the 
current school year. A number of factors have 
contributed to the selection of interest in science 
as the criterion: unlike many other pupil change 
criteria, interest is relatively independent of abil- 
ity differences, especially up to mid-adolescence 
(13, 33); an analysis of some of the origins of in- 
terest suggests it is amenable to influences such 
as the teacher’s classroom behaviors; it is suffi- 
ciently permanent to be an important predictor of 
many future activities of pupils; and interest is a 
worthy school objective in its own right. 

Taking all these factors into account, the cri- 
terion of interest in the subject seems to meet a 
number of the common criticisms concerning the 
use of pupil change as a criterion in teacher com- 
petence studies. 

Teacher Behaviors as Antecedents.—That the 
behavior of teachers produces a difference in the 
atmosphere of the classroom has been a truism for 
hundreds of years. Clear quantitative evidence for 
this is of relatively recent origin, perhaps not 
more thanthirty years. The research on teachers’ 
classroom personalities by Anderson and Brewer 
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(3), andthe research on leadership and group life 
headed by Kurt Lewin (24) de monstrated clearly 
that children behave differently while under the 
training of different ‘‘ty pes’’ of adults. Withall 
(37) also demonstrated the existence of different 
psychological climates for different teachers 
with the same groupof children. Thelen (34), 
Callis (8), Reed (31), Amatora (2), Bush (7), and 
Cogan (10), each using different measures, dif- 
ferent samples, and different variables but all with 
careful designs, found unequivocal evidence that 
the differences among teachers can be quantita- 
tively studied, that classroom atmosphere is part- 
ly a function of the teacher’s behaviors, and that 
the teacher’s behaviors measurably affect the pu- 
pils. 

The teacher variables that may affect pupil 
learning can be categorized as either environ- 
mentalor psychological. The first category would 
involve measures of the teacher’s home back- 
ground, number of years of schooling, the parents’ 
religious preference, etc. It is reasonable to as- 
sume that such environmental factors (past and 
present) have an effect upon teacher behaviors. 
The second category would be concerned with the 
physical, emotional, and cognitive aspects of the 
teacher. The antecedent variables of this re- 
search fall within this latter category. 

The Teacher Behavior Variable of Warmth— 
Warmth refers to pupils” perceptions! of teacher 
behaviors which relax interpersonal tension be- 
tween teacher and pupil. There is a consequent 
reduction in the frequency of pupils’ problems 
concerning ‘‘getting along with the teacher, ’’ but 
not necessarily inthe frequency of problems con- 
cerning subject-matter tasks. Terms frequently 
used as synonyms of warmth are affection, affil- 
iation, consideration, kindness, friendliness, 
sympathy, responsiveness, andgeniality. Teach- 
er behaviors expressive ofthe pupils’ perceptions 
of warmth are illustrated by this item from the 
antecedent scales: 





47. This teacher is careful about the 
feelings of pupils in this class 
1-Not at all. 2-A little. 
3-Somewhat. 4-Much. 
5-Very much. 


An hypothesis predicting a positive relationship 
between warmth and pupil interest in science is de- 
duced from the following rationale.2 The teacher’s 
warm relationship is a rewarding experience for 
the pupils, classroom learning activities become 
rewarding as a function of the teacher’s warmth 
behaviors. Pupils’ positive feelings toward the 
learning activities lead to participation, which in 
turn, frequently leads to satisfaction of other needs 
suchas the cognitive, asthe learning activities be- 
come inherently attractive, interest is learned. 





The Teacher Behavior Variable of Demand— 
Demand refers to the standards the teacher sets 
for each pupil’s performance on school tasks. De- 
mand is measured by items whichrefer to the pu- 
pil’s perceptions of the teacher’s level of demand 
as regards quantity, promptness, correctness, 
neatness, depth, thoroughness, honesty, attention, 
and orderliness. A sample item expressive of 
demand is: 





15. This teacher demands that we do 
our science reading carefully 
1-Not nearly strong enough. 
2-Almost strongly enough. 
3-Just strongly enough. 
4-Somewhat too strongly. 
5-Much too strongly. 


An hypothesis that demand is non-linearly re- 
lated to pupils’ interest in science, with the rela- 
tionship expressed as an inverted U-shape (mod- 
erate demand being the most effective), is based 
on the following ideas. Varying levels of demand 
generate withineach pupil concomitant variations 
in tensions. This tension over school work may 
often function as an extrinsic motivation. The 
tensions can become so high that the pupil is un- 
able to concentrate on the task, his energies are 
drained in many different directions in an effort 
to reduce the tensions to a bearable level. When 
tensions are very low, the pupil is not motivated 
by the challenge of utilizing his full abilities. 
When a moderate level of tension is established 
by the teacher’s demands, the pupil is motivated 
to perform near the limits of his ability and/or 
willingness. In this situation, frequent success 
results in the schoolwork becoming enjoyable for 
its own sake. 

The Teacher Behavior Variable of Utilization 
of Intrinsic Motivation—Intrinsic mot ivation re- 
fers to those motivations which the pupils have in- 
ternalized; the learning activities have become 
meaningful to the pupils, either as means or as 
ends. The underlying task of the teacher is to 
present the curriculum in such a way that the pu- 
pil appreciates the relationships between it and 
his needs, interests, attitudes, and purposes. 
The following item is expressive of the teacher’s 
utilization of intrinsic motivation: 








26. When we start new work, this teach- 
er helps us see why it is important 
to us 
1-Almost never. 
3-Sometimes. 
5-Very often. 


2-Few times 
4-Often. 


An hypothesis that intrinsic motivation is pos- 
itively related to pupils’ interest in science is 
based onthe principle that ifateacher is success- 
ful in relating the curriculum to the pupils’ needs, 





participation inthe learning activities is inherent- 
ly rewarding and a continuous feedback of energy is 
directed toward further voluntary involvement. 


Design of the Research 





The Control Variables—The science teacher’s 
behaviors are not, of course, the only determinant 
of the pupils’ science interest. “The present re- 
search design employs four control variables in an 
attempt to reduce unaccounted-for variance in the 
criterion scores: sexof pupil, father’s interest in 
science, school subject, and grade level. 

Consistently, researches on science interests 
have demonstrated that boys’ interest in science 
is stronger than girls’. The present analysis in- 
cludes a study of classes (teacher-groups), and 
since the boy-girl ratio varies from classto class, 
mean class scores will reflect this variation. 
The effect of the sex variable will be tested and, 
if found significant, will be controlled through 
statistical procedures. 

It is generally assumed that the interests of 
the father will have an influence on the interests 
of the children. As a simple index of the pupil’s 
perceptions of his (her) father’s interest in sci- 
ence, this item was included in the study: 





50. How much interest does your father 
have in science? 
1-Almost none. 2-A little. 
3-Some. 4-Much. 5-Very much. 


If a fairly strong relationship between father’s 
science interest and pupil’s science interest is 
confirmed by statistical analysis, and if classes 
differ significantly on mean father’s science in- 
terest scores, differences among classes will be 
controlled through appropriate statistical proce- 
dures. 

It is conjecturedthat the effects of teacher be- 
haviors on pupil interest may vary as a function 
of the nature and obj ectives of the curriculum. 
Therefore, teachers and pupils of only one sub- 
ject (science3) are used in the sample. 

It is also conjectured that the effects of teach- 
er behaviors may vary as a functionof the devel- 
opmental stage of the pupils. Therefore, pupils 
of only one grade (ninth) are used in the sample. 

The Pupil Inventory—The Pupil Inventory is 
composed of two parts. Part I is the Science In- 
terest Inventory, containing 70 science activity 
items. Each item is scored on a frequency scale 
from zeroto five. Pupils were carefully instruct- 
ed that only voluntary science activities performed 
during the present school year were to be report- 
ed. Voluntary science participation would appear 
to be a more valid measure of the pupils’ science 
interest than would statements of likes and dis- 
likes. Since only activities of the present school 
year were noted, the scores are a measure of 
performance during the same months that the pu- 
pils were withtheir present science teacher. 
This temporal concomitance of measured science 
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interests of pupils and measured behaviors of sci- 
ence teachers increases the tenability of reason- 
ing that a significant statistical relationship be- 
tween these two variables alsoimplies a degree of 
causal relationship. 

Part II of the Pupil Inventory is composed of a 
random ordering of teacher behavior items; these 
provide scores for the three antecedent variables 
of warmth(13 items), demand (14 items), and uti- 
lization of intrinsic motivation (15 items). Each 
item is scored on an intensity or frequency scale 
from 1 to 5. The ascriptionof each item to a par- 
ticular variable had been agreed upon by at least 
90% of 30 scholars in the fields of education and 
psychology. 

Administration of the Pupil Inventory requires 
less than 40 minutes. 

The Sample—The sample used in the analysis 
included 1045 ninth-grade boys and girls and their 
38 general science teachers, from 19 public 
schools representing 12 school systems in eastern 
Massachusetts. One class for each teacher was 
used, except that in three instances, two classes 
per teacher were inventoried and combined, be- 
cause of the small size of the classes. 

Administering the Pupil Inventory—The Pupil 
Inventory was administered by three trained as- 
sistants and the author. All data was collected in 
late May and early June of 1958, to insure that 
pupils were familiar with their science teacher’s 
behaviors, and to maximize the teacher’s influ- 
ence on the science interests of the pupils. All 
teachers left the room during the administering of 
the Pupil Inventory, with the exception of two who 
worked at their desks because the principal re- 
quested they remain, onthe basis of legal re- 
quirements. 

Each administrator read printed instructions 
aloud to the pupils, emphasizing the privacy of the 
pupils’ answers, the importance of honest and 
careful responses, and the ser ious nature of the 
research. A high level of pupils’ concentration 
was observed. 

Scoring the Pupil Inventories—Scoring proced- 
ures revealed six cases of pupils’ apparent care- 
lessness or failure to complete the Inventory ade- 
quately. Eleven other responses were not tabulat- 
ed because the pupils had beeninthe science class 
for less than four months. 

The 120 item responses oneach of the remain- 
ing 1045 Pupil Inventories were then punched on- 
to IBM cards. To verify the copying operation 
involved in card-punching, a print-out from the 
cards was proofread against the item responses 
in the original Inventories. New cards were 
punched for any in which errors were found. 








Symbols Frequently Used in the Analysis 








WwW The independent variable of the pupils’ per- 
ceptions of the teacher’s production of inter- 
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personal warmth betweenteacher and pupil. 


The independent variable of the pupils’ per- 
ceptions of the extent of the teacher’s util- 
ization of intrinsic motivation. 


The independent variable of the pupils’ per- 
ceptions of the teacher’s level of demand 
for pupils’ performance on school tasks. 


The independent variable for a compound 
measure of aspects of W and D. 4 


The independent variable of the pupil’s per- 
ceptions of his (her) father’s interest in 
science. 


The dependent variable of the pupils’ inter- 
est in science. 


Subscript referring to a class (teacher- 
group). 


Subscript referring to an individual score. 
Subscripts referring to total scores, to 
within-classes scores, and to between- 


classes scores. 


Sum of products of a dependent and an in- 
dependent variable, with appropriate sub- 
scripts. 


Sum of squares of a dependent variable, 
with appropriate subscripts. 


Sum of squares of an independent variable, 
with appropriate subscripts. 


N Number of pupils in the sample. 
kK Number of classes in the sample. 


Questions, Statistical Resolutions, and 
Interpretations 








What is the Reliability of the Scales for the 
Teacher Variables of W,M, and D?7—For the 
scales measuring the teacher behavior variables, 
a reliability based on the stability of the within- 
classes responses of pupils is meaningful. Such 
a measure can be derived from the formula 








2 


Pxx = 1- — 
% 


The values in Table I are takenfromthe analysis 
of variance-covariance reported later in this ar- 
ticle. The reliabilities are between . 88 and . 93 
for variables W and M, indicating that pupils of 





a given class agree fairly closely in their percep- 
tions oftheseteacher behaviors. The reliabilities 
for D are somewhat lower (.78to.80). A reason- 
able ex planation for the lower reliability of the 
demand scale is that, as some teachers fail to 
vary their demands to meet pupil differences, 
variations in pupil abilities affect the pupils’ per- 
ceptions of the level of a teacher’s demands. 

Can Single-Pupil Perceptions be Considered a 
Reliable Source Concerning Teacher Behaviors ?— 
The standard error of measurement of the scales 
can be determined by calculating the square root 
of the within-classes variance estimate. With all 
three scales, the high st andard errors of meas- 
urement listed in Table I signify that no single pu- 
pil’s rating of his teacher is a reliable estimate 
of the score that the class will give that teacher. 
This does not meanthat the perception of any giv- 
en pupil is unreliable for that pupil. A pupil who 
perceives his science teacher as highly motivat- 
ing, while most of his classmates rate the teach- 
er low, will be affected by the teacher as he sees 
the teacher. The classroom science teacher can 
expect pupils to agree in general as to how they 
see him; he can also expect some divergence of 
individual pupil impressions. The concept of ‘‘a 
class’’ can lead to erroneous conclusions if the 
teacher supposes that his behaviors are uniform- 
ly perceived. 

Are There Significant Differences Among the 
Teachers for the Antecedent Variables ?—Table I 
shows the variance estimates for testing the sta- 
tistical hypothesis of no difference among class 
means for each of the major variables of W, M, 
and D (data is also included for the criterion var- 
iable I). Acceptance of the hypothesis of no dif- 
ference for any variable would render meaningless 
a between-classes analysis involving the variable. 
Since all F’s are significant at the . 001 level, this 
hypothesis is rejected. Each of the scales, then, 
did measure differences among teachers. 

Do Boys andGirls Perceive Teachers Similar- 
ly as Regards the Teacher Variables of W, M, and 
D?—The fairly close agreement between the boy 
and girl means for W, M, and D shown in Table 
II suggests that the two sexes tend to rate their 
teachers in a similar fashion. Tests of the dif- 
ference between boys’ and girls’ means for each 
of the three variables provided no significant CR’s 
at the . 01 level. 

What is the Reliability of the Science Interest 
Inventory ?7—A reliability based on the stability of 
of the within-classes responses of pupits is not 
applicable to the criterion scale measuring the 
individual pupil’s interest in science; instead, an 
internal reliability concerned withthe consistency 
of the items in the science activity scale is need- 
ed. Using a split-half reliability technique, the 
70 items of the Science Interest Inventory were 
logically divided into two equivalent half-tests. 
The responses of the 1045 pupils were used for 


























TABLE I 


VARIANCE ESTIMATES FOR I, W, M, AND D SCORES—BOYS (N = 584) AND GIRLS (N = 452)* 





= 


Sum of Variance Standard Variance 
Source Squares Estimate lxx Error Ratio** 





Between 269, 876 7293. 94 2 18 
Within 1, 825, 848 3344. 04 


Between 196, 603 5957. 67 


Within 979, 399 2343. 06 54 





Between 27, 863 37 753. 05 
Within 48, 959 546 89. 67 


Between 41, 408 33 1254. 79 
Within 37, 034 418 88. 60 





Between 24, 703 37 667. 65 
Within 43, 853 546 78. 49 


Between 25, 188 33 763. 27 
Within 34, 865 418 83. 41 





Between 13, 806 37 373. 14 78 53 
Within 45, 005 546 82. 43 ' ; , 


Boys 


Between 14, 958 33 453. 27 . 80 9.6 . 90 


Girls Within 38, 651 418 92. 47 





* This N for the girl sample is 9 less than the original N, as a result of the elimination of 4 
classes with fewer than 5 girls per class. The total number of girl classes is 34, while 
the boy classes remain at 38. 

**All F’s are significant at . 001 level. 
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TABLE 0 


COMPARISON OF THE TOTAL BOY AND THE 
TOTAL GIRL MEANS FOR W, M, AND D 





Mean N S. D. 


Variable 








Boys 42.4 584 
43.0 452 


Girls 





Boys 42.9 584 
Girls 42.3 452 
Boys 37.1 584 10.0 
Girls 35. 7 452 10.9 





D 





*No CR’s are significant at the .01 level. __ 


correlating the split-halves scores, and when the 
correction is made with the Spear man- Brown 
formula, a reliability coefficient of .97 is ob- 
tained for the whole criterion scale. This high 
value indicates internal consistency of the science 
interest items, and consistency of each pupil’s 
responses. 

Do Boys and Girls Have Equally Strong Sci- 
ence Interests 7—In the original treatment of the 
data, this test was the first order of business, 
since a significant difference in boy and girl I 
scores would necessitate control of the sex vari- 
able. TheT score of the 584 boys is 130. 3; the I 
score for the 461 girls is 103.8. From Table III 
a variance ratio of 57. 0 is obtained, and the sta- 
tistical hypothesis of no difference is rejected at 
the . 001 level. 





TABLE Il 


VARIANCE ESTIMATES FOR TESTING SEX 
DIFFERENCE IN MEAN I SCORES 


Variance Var. 
df Estimate Ratio* 





Sum of 
Squares 


Source 





Bet ween 181, 637 1 181,637 


57.0 


Within 3,322,353 1043 3, 185 





*Significant at the . 001 level. 


The difference between the two mean scores 
is so large that pooling of girl and boy scores 
would lead to major distortion in the analysis in- 
volving classes. Therefore, the total sample is 
separated intotwo sub-samples, one for the boys 
and oneforthegirls. At this point the total num- 
ber of girls is reduced from 461 to 452, after 





elimination of four classes with fewer than five 
girls per class. The number of boy classes is 38; 
the number of girl classes is 34. 

Is There a Significant Relationship Between 
Father’s Interest in Science and Pupil’s Interest 
in Science ?—Table IV contains data on the product- 
moment correlation coefficient for F andI. The 
obtained rpy for boys (.40) and rfy for girls (. 27) 
are based on totalsamples. Application of the 
significance test indicates both r’s to be signifi- 
cant at the . 001 level. 








TABLE IV 


RELATIONSHIP OF F AND I 





Variance 


Sample ry Ratio* 





Total, Boys (N=584) .40 108.4 me 


ny 
Nea 


Total, Girls (N=452) .27 34. 4 





*Both ratios are significant at . 001 level. 


In Studying Teacher Effects, Does the Influence 
of F Upon I Need to be Controlled?—The obtained 
significant relationships of rfy pose the problem 
of whether there is sufficient attenuation of! scores 
for classes, attributable to differences among F 
scores for classes, to weigh in the choice of meth- 
ods of analysis. Table V presents the requisite 
variance estimates for the test of significant dif- 
ferences among the boys’ F scores for classes, 
and also among the girls’ F scores for classes. 
The obtained variance ratios for boys of 1.56, and 
for girls of 1. 62, are not significant at the . 01 
level, but are at the .05. The statistical hypoth- 
esis of no differences among F scores for classes 
is accepted, with some reservations. 








TABLE V 


VARIANCE ESTIMATES FOR TESTING DIF- 
FERENCES AMONG F SCORES FOR CLASSES 





Variance Var. 
df Estimate Ratio* 


Sum of 


Source Squares 





Between, Boys 78.0 37 2.10 , 56 
Within, Boys 736.9 546 1.35 


64. 7 33 1. 96 1. 62 


Between, Girls 
506.5 418 1. 21 


Within, Girls 





*Neither ratio is significant at the . 01 level, but 
both are significant at . 05. 





McNemar (27:354), in discussing procedures 
for adjusting scores by the covariance method, 
writes: 


.... But, if the within-group correlation is 
low and/or there is only a small chance 
difference between the groups on the uncon- 
trolled variable, the use of the covariance 
adjustment may not be worth the effort.... 


The moderate strengthof the relationship of F 
and I, and the lack of significant differences at 
the . 01 level among F scores for classes, dem- 
onstrates that a very limited amount of the vari- 
ability of I scores for classes can be reduced by 
correcting for differences due to F. Therefore, 
the attenuation of I scores for classes, due to the 
influence of F, was considered insufficient to dic - 
tate subsequent methods of analysis. 

Does the Father’s Science Interest Have a 
Greater Influence on Son’s Interest Than on 
Daughter’s ?7—A test of any significant difference 
in the rfy for the boy sample (. 40) and the girl 
sample (. 27) can be made using Fisher’s r to z 
transformation. The resulting critical ratio of 
2.33 is significant at the :02 level. The statisti- 
cal hypothesis of no significant difference is re- 
jected atthe .02 level. The demonstration of this 
difference supports the reasoning that, at the 
ninth-grade level, sons tend to imitate their fa- 
thers somewhat more closely than do daughters. 

Is There a Significant Relationship Between D 
and T and is it Non-Linear ?7—Values for the lin- 
ear correlation and for the correlation ratio are 
in Table VI. The relevant variance estimates for 











TABLE VI 


RE LATIONSHIP OF D AND I 





Type of Strength of Relationship* 





Relationship Boys (N=584) Girls (N=461)** 





Linear (r) . 04 . 04 


Correlation 
Ratio (eta) .13 .19 





* None of the relationships is significant at the 
.01 level. 

**This analysis, based on the coded scores ofthe 
total samples, was made prior to the major 
analysis involving raw scores and computer 
techniques. At this stage, the reduction of the 
girls’ N to 452 had not yet been made. 


determining eta, and the tests for significance of 
eta and of linearity of regression, are shown in 
Table VII. The ratio of variance estimates for 
between-array means and for within arrays pro- 
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vides a test for the significance of eta. The ob- 
tained F’s of . 81 for the boy sample and 1. 43 for 
the girl sample indicate large within-arrays var- 
iance. For both the boy and the girl samples, the 
linear relationship and the correlation ratio are 
not significant at the .01 level. A further test for 
significance of curvilinearity would be meaning- 
less. 

While the predicted positive effect of mild ten- 
sion was not demonstrated, it is important to em- 
phasize that this researchdid not demonstrate the 
negative relationship between tension and pupil 
gain traditionally hypothesized. This consistent 
lack of a relationship betweenteacher demand and 
the broad school objective of pupils’ interest in 
science may encourage other researchers to in- 
vestigate the relationship of teacher demandto 
more specific criteria, such as subject matter 
achievement. There is the evidence from this re- 
search that the more general goal of interest will 
not necessarily be adversely affected as a concom- 
itant of the teacher’s use of demand. 

What is the Relationship Between M and [?—In 
this study, the strongest relationship between any 
teacher behavior variable andthe criterion is be- 
tween M and I. The most accurate measure of the 
relationship based on individual pupil scores is 
provided by the within-classes ry, which cor- 
rects for possible spuriousness evidenced by be- 
tween-classes correlation. Tables VIII and IX 
show the within-classes ry; values: . 32 for boys 
and .42 for girls, both significant at the . 001 lev- 
el. Also from Tables VIII and IX, the between- 
classes rjyyq for boys (.40) is significant at the . 05 
level; the between classes ryyq for girls (. 53) is 
significant at the . 01 level. 

There is no significant difference at the . 01 
level between these correlations of Mand I for the 
boy sample and the comparable correlations of M 
and I for the girl sample. 

For the form of the relationship, a test of the 
hypothesis of linear regression for the total ryyq 
can be made through the formula 





S,° K number of arrays 


F =—-, n, =k-2 
Sy nz =N-k 


where Sq’ represents the variance estimate based 
on the deviation of array means from the line and 
Sw’ represents the variance estimate based onthe 
deviation of individual scores within arrays. Val- 
ues for the F ratios of the twosamples are in Ta- 
ble X. Since both F’s are less than unity, the hy- 
pothesis of linear regression is accepted for each 
sample. 

Two of the variance estimates of Tables XI and 
XII may be used to provide an F ratio to test for 
linearity of regression based on the means of M 
and I (between-classes rjyj): 
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TABLE VII 


ANALYSIS OF VARIANCE:COVARIANCE, FOR FUNCTIONS OF M AND I—BOYS 





Total Within Between 





Sum of Products 
M and I = 123,118 Ay = 90, 263 Ab = 32, 855 


Sum of Squares, I Bt = 2,095, 724 By = 1, 825, 848 Bp = 269, 876 
Sum of Squares, M Ct = 67, 556 Cw = 42, 853 Ch = 24, 703 
df for Variances N - 1 = 583 N - k = 546 


Correlation Coef- At - 33 


ficient, rygq Va VG 
df for rmi N - 2 = 582 N-k-1 =545 


Significance Level 

for rm Sig. at . 001 level Sig. at .001 level Sig. at . 05 level 
Regression Coef- At - 1.82 Aw - 211 Ab = 1.33 
ficient, bry — = a!) so 





TABLE Ix 


ANALYSIS OF VARIANCE:COVARIANCE, FOR FUNCTIONS OF M AND I—GIRLS 








Total Within Between 





Sum of Products 
M and I = 114,954 Aw = 77, 731 Ab = 37, 223 


Sum of Squares, I Bt = 1,176, 002 By = 979, 399 Bb = 196, 603 


Sum of Squares, M Ct = 60, 053 Cw = 34, 865 Cp = 25, 188 


df for Variances N - 1 = 451 N - k = 418 k - 1 = 33 


Correlation Coef- At 43 Aw - 42 Ab = 53 


ficient, r'mMI vB VCt V By vCw Vv Bb VCp 
df for ryy N - 2 = 450 N-k-1=417 k - 2 = 32 


Significance Level 
for ry] Sig. at . 001 level Sig. at . 001 level Sig. at . 01 level 
5 A 
Regression Coef = 1.91 = 2.23 Ab = 1.48 
ficient, bry Ct Cb 
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s." n, = k-2 
Sw ng = N-k-1 


F = 


The obtained variance ratios of 2.09 for boys and 
2.29 for girls are significant at the . 001 level. 
This demonstrates that the estimate of the vari- 
ance of class means about the regression line 
based on means (S,*) is significantly larger than 
the estimate of the variance of individual scores 
about the regression line of common within- 
classes slope (Sy’). Therefore, the hypothesis 
of linearity of regression based on means is re- 
jected for both the boy and the girl sample. 

Are the 38 Separate Class Slopes byy for Boys 
Equivalent, and are the 34 Separate Class Slopes 
for Girls Equivalent?7—Tables XI and XII provide 
the necessary variance estimates to test the hy- 
pothesis that slope byyy of the within-classes re- 
gression line is equivalent for all classes of a 
given sex sample. The formula 











s,* Neg N - 2k 


results in an F of .54, for the boy sample. A 
similar test for the girl sample results in an F 
that is also less than unity. The estimate of the 
variance of class r’s about the common coeffi- 
cient (s,”) is less than the estimate of the vari- 
ance of the individual scores about the regression 
line of each class (S,*). The hypothesis of a com- 
mon within-classes slope for all boy classes and 
a common within-classes slope for all girl class- 
es, is accepted. For boys, this common within- 
classes byy is 2.11; for girls, the common within- 
classes byy is 2. 23 (Tables VIII and IX). 

The consistency of the relationship of the 
teacher’s utilization of intrinsic motivation and 
pupils’ interest in science canbe appreciated from 
the above demonstration that in each sample (for 
boys and for girls) the highly significant within- 
classes slope is common for all classes. That 
is, any variations among the 38 separate class 
slopes bry for boys, and among the 34 separate 
class slopes byy for girls, can be attributed to 
sampling error. 

Figure 1 illustrates for the girl sample the 
regressions for the total slope, for the common 
within-classes slope, and forthe slope of the 
class means. The variations® of the 34 separate 
class slopes from the common within-classes 
slope are shown. Alsoindicated is the scatter of 
the class means about the regression line deter- 
mined by those means. The girl sample is select- 
ed for illustration because it provides one more 
highly significant regression (for between-class- 


es) than was de monstrated for the boy sample. 


For simplicity, the slope for the class means is 
shown here as a linear regression, although it 
has been established that the actual regression is 
curvilinear. 
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The three regression equations, and values for 
the appropriate slopes, are summarized in Table 
XIII. The equation for the 34 separate class slopes 
is given; the se par ate values are too detailed to 
list. 

Are There Factors, in Addition to M and Other 
Controlled Variables, Which are Affecting the Cri- 











terion Scores ?7—To resolve this question it is nec- 


essary to test the hypothesis that a single regres- 


sion line fits both total and between-classes pop- 
ulations. To test this hypothesis, the following 
variance ratio is appropriate: 


Sp* ny 
vile SF n, = N-k-1 

The relevant variance estimates for the two 
samples are presented in Tables XI and XII. The 
F ratios for boys (2.12) and for girls (2.10) are 
both significant at the .001 level. Therefore, the 
estimate of the variance of group means about the 
regression line of the common within-classes 
slope (Sp)*) is significantly larger than the esti- 
mate of the variance of individual scores about the 
regression line of common wit hin-classes slope 
(Sw*). The hypothesis of a common regression 
line for all populations is rejected for both sam- 
ples, since acceptance would require that Sy and 
Sy’ be similar. 

The lack of acommon slope means that with 
respect to classes there are factors other than 
teachers’ utilization of intrinsic motivation which 
account for some of the variation in interest scores 
among the classes. Certain of these attenuating 
factors had been anticipated prior tothe collection 
of the data, and various methods of control were 
included inthe design to eliminate or reduce their 
possible effects (sex of pupil, father’s science in- 
terest, subject matter, and grade level). Many 
other possible forces were beyond the scope of 
this study, such as additional characteristics of 
the teachers, school atmosphere, and previous 
experience of the pupils. 

What is the Relationship Between W and 1?6— 
The product moment correlation coe fficients for 
W and I are calculated from the information in 
Tables XIV and XV. The within-classes rw] val- 
ues are . 20 for boys, and . 28 for girls, both sig- 
nificant at the . 001 level. 

The between-classes rw for boys of .23 is not 
significant at the .01 level. The higher between- 
classes rw for girls of .53 is significant at the 
. 01 level. 

A test of the difference between the boy and the 
girl within-classes rw], using Fisher’s r to z 
transformation, results ina CRof1.35. This 
ratio is not significant at the .01 level. The hy- 
pothesis that the two r’s are not significantly dif- 
ferent is accepted. A test of the difference be- 
tween the boy and the girl bet ween-classes rwy 
results in a CR of 1.44, which is not significant 
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FIGURE 1 


RELATIONSHIP BETWEEN M AND I (GIRLS) 
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TABLE XI 


REGRESSION EQUATIONS FOR M AND I—GIRLS 








Source of Regression Equation Slope 


IT + by(Mj - M) bb = 1.91 
Common Within-Classes I, + bw(Mjr - My) by = 2.23 


Bet ween Classes I, I + bp(Mr - M) bb 1. 48 


Any Specific Class I, + br(Mjr - My) br = (34 separate 
values) 


TABLE XIV 


ANALYSIS OF VARIANCE:COVARIANCE, FOR FUNCTIONS OF W AND I—BOYS 


Within Between 











Sum of Products, 
W and I At = 80, 727 Aw = 60, 703 Ap = 20, 024 


Sum of Squares, I Bt = 2,095, 724 By = 1, 825, 848 Bb = 269, 876 
Sum of Squares, W Cy = 76, 822 Cy = 48,959 
df for Variances N - 1 = 583 


Correlation Coef- __ At - 20 
ficient, rwy VB VC Ree 
df for rw] N - 2 = 582 


Significance Level 
for rwI Sig. at . 001 level Sig. at . 001 level 
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at the . 01 level. 

Can Prediction of I be Increased Through a 
Study of the Relative Contributions of W, M, D, 
and F?—The relative contribution of each of the 
antecedent variables tothe prediction of the inde- 
pendent variable I can be observed in Table XVI 
(products of B’s and r’s). For both the boy and 
the girl samples, the two variables of M and F 
contribute the most weight, with F the more pre- 
dictive for boys and M the more for girls. 

The value of R (multiple correlation) is .50 
for the boy and for the girl sample. The obtained 
F for boys is 37.9, and for girls is 29.2; both R’s 
are significant at the . 001 level. 

While the many origins of pupils’ interest in 
science do not constitute the central problem of 
this study, it is worth noting that 25% of the var- 
iance in pupils’ science interest scores can be ac- 
counted for by a knowledge of the variation in 
scores for the teacher’s utilization of intrinsic 
motivation and for the father’s interest in sci- 
ence. 

Does the Effect of One Teacher Variable Upon 
I Vary as a Function of Other Teacher Vari- 
ables ?—In Tables XVII and XVIII, four cells have 
been formed by a double classification based on 
class mean scores for high and low motivation, 
and for moderate and low demand (there were no 
high D scores for classes). The cell values are 
derived from the corresponding Ig scores for 
classes. The variations of these cell means sug- 
gests that the differences inTIg scores are due not 
only to the effects of high and low M, but also to 
some otherinfluence. As D has no significant 
correlation with I, it is possible that some of the 
differences inthe cell means are due to the inter- 
action of M and D. 

A test of the hypothesis ofinteraciion of M and 
D is made through complex analysis of variance. 
Prior tothis analysis, two steps were taken. First, 
to account for as much within-cell variability of 
I scores for classes as possible, each I score 
was adjusted for differences due to F scores for 
classes, 6 using the regression equation for ad- 
justed scores: 














Tra - I, . brpw(Fr - F) 


The reduction in variability canbe partially gauged 
by comparing the unadjusted withthe adjusted sum 
of squares for I, as summarized in Tables XIX 
and XX. 

In the second prior step, correction for dis- 
proportionality of cell frequencies was made 
through procedures outlined by Wert, Neidt, and 
Ahmann (36). The adjustment term is added to 
the sum of squares for M and D, and is subtract- 
ed from the sum of squares for interaction. 

Table XXI summarizes the interaction data. 
The variance ratios to test the main effects and 
to test interaction are formed by using the within 





REED 221 


variance estimate as the denominator term. As 
indicated by previous corre lation analysis of M 
and I scores for classes, the main effect of mo- 
tivation is significant for girls at the . 01 level, 
and for boys at the .05 level. In both samples, 
the non-significant variance ratio for the main ef- 
fect of demand is also to be expected from pre- 
vious analysis. Since neither variance ratio for 
interaction is significant at the . 01 level, the hy- 
pothesis of no interaction of M scores for classes 
with D scores for classes is accepted for both 
samples. If the interaction effect does exist, it 
is not sufficiently strong to be demonstrated with 
the number of classes available for study. 

Is There an Interaction Effect of M and D, Us- 
ing Pupil I Scores as Criterion?—Increased cell 
frequencies may be obtained by using pupil scores 
as a basis for further study of interaction of M 
with D, using cells formed essentially as in Ta- 
bles XVII and XVIII. Individual I scores are not 
adjusted for differences due to the influence of F, 
since control by covariance adjustment is appro- 
priate only with groups. Adjustment is made for 
disproportionality of cell frequencies. The var- 
iance estimates for this test of interaction are 
presented in Table XXII. For both samples, the 
variance ratio is significant at the .001 level for 
the maineffect of M. The main effect of D is not 
significant at the .01 level for either sample. 
The variance ratio for interaction is less than 
unity in both samples, demonstrating large with- 
in-cell variance. The hypothesis of no interac- 
tion is accepted. 

A similar series of tests for interaction of W 
and D, using pupil I scores as the criterion, re- 
sulted in acceptance of the hypothesis of no inter- 
action for this combination of variables. 

Does Sex of Teacher, Interacting with Sex of 
Pupil, have an Effect on Pupils’ I Scores ?—As the 
sample includes both womenteachers (N = 10) and 
men teachers (N = 28), it is possible to test for 
differences in pupil Iscores attributable to teach- 
er sex. (The test of significance for the main ef- 
fect of pupil sex has been discussed previously. ) 
Table XXIII shows the variance estimates for test- 
ing interactionandthe main effect of teacher sex. 
The hypothesis of no difference due to teacher 
sex is accepted, since the obtained variance ratio 
is less than unity. The hypothesis of no differ- 
ence due to interaction is accepted, since the ob- 
tained F of 2.07is not significant at the . 01 level. 

Does Sex of Teacher, Interacting with Sex of 
Pupil, have an Effect on Pupil Perceptions of 
Teacher Behaviors ?—Tests for any differences in 
pupil perceptions of men and women teachers for 
D, M, and W can be made using the values in Ta- 
ble XXIV. (The lack of significant differences in 
perceptions of D, M, and W due to pupil sex has 
been previously discussed.) The obtained vari- 
ance ratios for the main effects of teacher sex 
and for interaction, are not significant at the . 01 
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TABLE XVII 


CLASSIFICATION OF Ta SCORES FOR CLASSES, * BASED 
ON TWO LEVELS OF M SCORES FOR CLASSES AND TWO 
LEVELS OF D SCORES FOR CLASSES—BOYS (N = 38) 





ll cases 


I, 130. 7 I4 127.4 


8 cases ll cases 


| 


| = 


4 — — 


4 


*The I scores for classes have been adjusted for class dif- 
ferences due to F. 


TABLE XVIII 


CLASSIFICATION OF Ig SCORES FOR CLASSES, * BASED 
ON TWO LEVELS OF M SCORES FOR CLASSES AND TWO 
LEVELS OF D SCORES FOR CLASSES—GIRLS (N = 34) 


Motivation 


+ 


| Low 


10 cases 


T, = 114.4 


7 cases 10 cases 


| se eae 


*The I scores for classes have been adjusted for class dif- 


ferences due to F. 
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TABLE XIX 


ANALYSIS OF VARIANCE:COVARIANCE, FOR FUNCTIONS OF F AND I—BOYS 





Sum of Products, 
F and I 


Sum of Squares, I 
Sum of Squares, F 
df for Variances 


Correlation Coef- 
ficient, r FI 


Regression Coef- 
ficient, bir 


Adjusted Sum of 
Squares, I 


df for Adjusted Sum 
of Squares 


Total 


At = 16, 357 
Bt = 2,095, 724 
Ct = 815 


N - 1 = 583 


minus 


1, 767, 360 


= 1, 825, 848 


Ww 
By - Cw equals 
W 1,565, 839 


N-k-1=545 


TABLE XX 


Between 


Ap = 2,515 
Bp = 269, 876 


Ch = 78 


Adjusted Bp 
201, 521 


ANALYSIS OF VARIANCE:COVARIANCE, FOR FUNCTIONS OF F AND I-GIRLS 


Sum of Products, 
F and I 


Sum of Squares, I 
Sum of Squares, F 
df for Variances 


Correlation Coef- 
ficient, rey 


Regression Coef- 
ficient, byp 


Adjusted Sum of 
Squares, I 


df for Adjusted Sum 
of Squares 


Bt = 1, 176, 002 


A 
nis 


a - & 


Ct 
1, 092, 554 


minus 


N - 2 = 450 


979, 399 


equals 


Y 914, 983 


Ab = 1, 192 
Bp = 196, 603 
Cp = 65 

k - 1 = 33 


Ab = 
a . 70 
/Bp VCD 
Ab 
Cb 


18. 4 


Adjusted Bp 
177,571 
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TABLE XxXI 


VARIANCE ESTIMATES FOR TEST OF INTERACTION OF M WITH D SCORES FOR CLASSES— 
BOYS (N = 38) AND GIRLS (N = 34) 





Source 


Motivation 


Demand 


Interaction 


Within 


Motivation 


Demand 


Interaction 


Within 


Adjusted Sum 
of Squares* df 


Variance 
Estimate 


Level of 
Significance 


Variance 
Ratio 





1882.0 1 
492.1 1 
604. 


10401. 


2950. 
1163. 
704. ! l 


9315.5 30 


1882.0 
492.1 


604. 4 


2950. 1 
1163. 6 
704.5 


310.5 


6.15 Sig. at .05 level 


1. 61 Not sig. at .01 level 


1.98 Not sig. at . 01 level 


Sig. at . 01 level 
Not sig. at . 01 level 


Not sig. at . 01 level 


*Sum of squares adjusted for disproportionality ‘of cells. 


TABLE XXII 


VARIANCE ESTIMATES FOR TEST OF INTERACTION OF M WITH D SCORES— 
BOYS (N = 584) AND GIRLS (N = 452) 


Source 


Motivation 
Demand 
Interaction 


Within 


Motivation 
Demand 
Interaction 


Within 





Adjusted Sum 
of Squares* 


148, 054 
11, 165 
2,110 


1,945,123 


123, 814 
222 
868 1 


1,043,112 448 


Variance 
Estimate 


Level of 
Significance 


Variance 
Ratio 





148, 054 
11, 165 


2,110 


123, 814 
222 
868 


2, 328 


44.14 Sig. at .001 level 


at . 01 level 


3. 33 Not sig. 


. 63 Not sig. at . 01 level 





53.18 . 001 level 
ca . at .01 level 


, . at .01 level 





*Sum of squares adjusted for disproportionality of cells. 
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TABLE XXIV 


VARIANCE ESTIMATES FOR TESTING EFFECTS OF TEACHER SEX ON BOY AND 
GIRL PERCEPTIONS OF D, M, AND W (N = 1036) 





Adjusted Sum Variance Variance Level of 
Source of Squares* Estimate Ratio Significance 


Teacher Sex : 314 ‘ Not sig. at .01 level 
Pupil Sex 488 . 4 Not sig. at .01 level 
Interaction 3 3 , Not sig. at .01 level 


Within 


Teacher Sex 2s ' sig. at .01 level 
Pupil Sex 2 ; sig. . 01 level 
Interaction : ; sig. . 01 level 


Within 


Teacher Sex 5, 9§ 5959 a a level 
Pupil Sex 41 ‘ sig. at .01 level 
Interaction 409 1 409 8 sig. at .01 level 


Within 148, 896 1032 144 


*Sum of squares adjusted for disproportionality of cells. 


TABLE XXV 


CLASSIFICATION OF W SCORES, BASED ON TEACHER 
SEX AND PUPIL SEX (N = 1036) 


Teacher Sex 


“a 


Male Female 
ry 3 t meee 2 | 
W, 43.50 | W, = 39.12 


| 

439 cases 145 cases | 
' 

— 

4 | 

W, = 44.61 Ww 37. 23 | 


351 cases 101 cases 
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level with one important exception: the main ef- 
fect of teacher sex with W as the criterion. The 
obtained variance ratio of 41. 30 for differences 
in pupils’ perceptions of W attributable to teach- 
er sex is significant atthe .001 level. Table XXV 
demonstrates that both boys and girls tend to rate 
the women teachers considerably lower than men 
teachers on W. 

One explanation for this finding is that children 
are less accustomed to close, frequent rapport 
with men, and feel that men are less approach- 
able. Objectively similar warmth behaviors of 
men and of women teachers may be so perceived 
that pupils over-estimate the strength of the W 
behaviors ofthe menteachers. Or perhaps in this 
sample the men teachers had, in general, warm- 
er personalities than did the women. 

What Implications can be Drawn from the High 
Intercorrelation of Wand M?—In the construction 
of the scales measuring the teacher behavior var- 
iables of W and M, an attempt was made to dis- 
tinguish clearly betweenthem. As previously 
discussed, competent judges did distinguish be- 
tween the two variables. Yet the intercorrela- 
tion of W and M (. 66 for boys and. 70 for girls) 
and the results of partial and multiple correlation 
analysis, indicate that W scores have little pre- 
dictive power that is not also available through 
scores for M, but not vice versa. It could be sug- 
gested that pupils did not distinguish between M 
and W, due either to inadequate scale construc- 
tion or to pupils’ subjective halo effects in their 
perceptions of teachers. However, the fact that 
in responding to the W items, both boys and girls 
rated women teachers as significantly lower on 
W than men teachers, but did not so differentiate 
as regards Mitems, indicates that pupils did dis- 
tinguish between warmth and intrinsic motivation 
behaviors. That is, the tendency for pupils to 
rate a given teac her somewhat similarly on the 
variables of warmth and of intrinsic motivation 
was due, in part at least, to the correspondence 
within the teacher of these two types of behaviors. 
This may be interpreted to meanthat in the train- 
ing of teachers, instructions in the utilization of 
intrinsic motivation are more likely to be suc- 
cessful if the student teacher already possesses 
the attributes of warmth, or can be helped to ac- 
quire them. 











Summary and Conclusions 





This research was structured within the gen- 
eral pattern of teacher competence studies, with 
the purpose of identifying some teacher behaviors 
that relate to desirable pupil learning. The se- 
lection of the teacher antecedent variables (warmth, 
demand, and intrinsic motivation) was based on 
rationale as to teacher behaviors that would logi- 
cally contribute to the changing of pupil behaviors. 
The selection of the consequent, desirable pupil 





learning (interest in science), rested upon judg - 
ments as to what constitutes important, measur- 
able school objectives. 

The sample included 1045 ninth-grade boys and 
girls and their 38 general science teachers from 
19 public schools in eastern Massachusetts. 

The design of the research included control of 
four factors which might affect the criterion 
scores for classes: school subject, grade level, 
sex of pupil, and father’s interest in science. 

Pupils within a class agreed closely in their 
ratings onthe variables of warmth and of intrinsic 
motivation, with reliabilities between . 88 and . 93 
for the stability of within-class responses. Pupil 
agreement on the variable of demand was less 
high, with reliabilities between . 78 and . 80. 

The Science Interest Inventory, a measure of 
voluntary science activities, provided the criter- 
ion scores. Its reliability, as determined by the 
split-half technique and corrected by the Spear- 
man- Brown formula, is . 97. 

Analysis supported the conclusion that the 
classroom science teacher can expect pupils to 
agree in general as to how they perceive him; he 
can also expect some divergence of individual pu- 
pil impressions. 

A test of the differences among the class (teach- 
er-group) means of pupil ratings indicated in the 
teacher sample a diversity for each teacher be- 
havior variable, significant at the . 001 level. 

A major difference was found betweenthe means 
for the 584 boy and the 452 gir] science interest 
scores, with boys reporting significantly more ac- 
tivities than girls (. 001 level). 

An hypothesis that the teacher variable of de- 
mand is non-linearly related to pupils’ interest in 
science was rejected. No significant relationship 
was found for these two variables. It was sug- 
gested that the evidence from this research indi- 
cates that tensions over school work are not nec- 
essarily detrimental to such broad learning objec- 
tives as interest in the subject matter. 

An hypothesis that the teacher variable of util- 
ization of intrinsic motivation is positively relat- 
ed to pupils’ interest in science was accepted. 
The within-classes correlation for boys was . 32 
and for girls was .42, with both significant at . 001 
level. Further analysis demonstrated that the 38 
separate correlations for boy classes are equiv- 
alent, and the 34 separate correlations for girl 
classes are equivalent; any variations can be at- 
tributed to chance. 

An hypothesis that the teacher variableof 
warmth is positively related to pupils’ interest in 
science was accepted, with positive and highly 
significant (.001) within-classes correlations of 
. 20 for the boys and . 28 for the girls. 

A multiple correlation analysis demonstrated 
that 25% of the variance in pupils’ science inter- 
est scores can be accounted for by a knowledge of 
the variation in scores for the teacher’s utiliza- 





228 JOURNAL OF EXPERIMENTAL EDUCATION 


tion of intrinsic motivation and for the father’s 
interest in science. 

It was hypothesized that the interaction of the 
teacher variables might weaken or strengthen one 
another’s effect onthe pupils’ interest inscience. 
Analysis, however, revealedno significant inter- 
action effects. 

The findings for boththe boy and the girl sam- 
ples were remarkably similar throughout the an- 
alysis—e. g., inthe various correlation tests, 
and tests for differences in boy and girl percep- 
tions of teacher behaviors. Adifference between 
boy and girl samples was found in the strengthof 
the relationship of father’s interest in science to 
son’s or daughter’s interest. 

Analysis of the intercorre lation of the two 
teacher variables of warmth and intrinsic moti- 
vation demonstrated a fairly strong relationship. 
Other analysis indicated that pupils did distin- 
guish between these two variables; itis therefore 
reasonable to suggest that these two teacher 
characteristics tend to coexist withinthe teacher. 

Analysis indicated that pupils perceive the 
women teachers in this sample as significantly 
lower (.001 level) on warmth than the men teach- 
ers. The number of teachers (38) was too small 
to permit any generalized inferences beyond this 
sample. 

In conclusion, the findings of this research 
seem to warrant the generalization that the sci- 
ence interests of many pupils in this sample are 
independent of the low and moderate demands of 
the teacher, but are a function of the teacher’s 
capacity to establish a relaxed interpersonal re- 
lationship with the pupil, and of the teacher’s 
ability to utilize the educational principle of in- 
trinsic motivation. 


FOOTNOTES 


This article is derivedfrom the author’s doc- 
torate thesis, completed at the Harvard Grad- 
uate School of Education and entitled Pupils’ 
Interest in Science as a Functionof the Teach- 
er Behavior Variables of Warmth, Demand, 
and Utilization of Intrinsic Motivation. 











. There is growing evidence that pupil percep- 
tions are in close agreement with projective- 
type measures of teachertraits. (Projective- 
type measures of personality variables are 
frequently considered more valid than other 
types ofinstruments.) Teachers’ self-ratings 
show low relationships with either of these. 


. For a more detailed discussion of the bases 
for this and succeeding hypotheses, the read- 
er is referred to the original dissertation. 


. At the ninth-grade level in Massachusetts pub- 





lic schools, the science course offered is al- 
most exclusively general science. 


. This variable was of an exploratory nature and 


is not sufficiently germane to be discussed in 
this article. 


. These variations can be attributed to chance. 


. The questions previously asked about the rela- 


tionships of M and I can be repeated for the 
relationships of W and I. Only the major fea- 
tures of the latter are necessary for present 


purposes. 


. While the within-classes correlation of F and 


I is moderately strong, the differences among 


F scores for classes are significant only at 


the .05 level. Adjusting for such differences 
does not reduce greatly the unaccounted -for 
variance of I scores for classes. 
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THE TRANSFER EFFECT OF A LEARNING 
PROGRAM IN SOCIAL CAUSALITY ON AN 
UNDERSTANDING OF PHYSICAL CAUSALITY* 


ROLF E. 


MUUSS** 
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Towson, Maryland 


VARIOUS STUDIES have de monstrated that 
participation in an experimental learning program 
designed to develop an understanding and appre- 
ciation of the nature of human behavior and the 
dynamic factors operative in social situations, 
does substantially increase knowledge of social 
causality (10, 13, 14, 16,20). Social causality has 
been defined as an awareness of the dynamic com- 
plexity of human motivation, and as an under- 
standing of the interacting nature of the forces 
that operate in human behavior and in social sit- 
uations ingeneral. It involves flexibility in think- 
ing, and awillingness to see things from the view- 
point of others. A causally oriented person will 
suspend judgment until sufficient information is 
available. He realizes that his behavior has con- 
sequences, and that there are alternative ways of 
solving most social problems (12). He has a func- 
tional understanding of the concept of probability 
in respect to measurements and predictions. A 
more detailed theoretical discussion of the con- 
cept of social causality can be found in the liter- 
ature (8). The staff of the Preventive Psychia- 
try Program at the State University of Iowa is in 
the process of developing a causally oriented 
learning program in an attempt to investigate to 
what extent social causality contributes to school 
adjustment and mental health (12). 

The emphasis inthe program is on social caus- 
ality, that is, an understanding of the multiple 
causes of behavior, the effects of behavior, and 
the im portance of thinking in probability terms 
when considering both causes and effects. The 
question arises whether Ss, who have participated 
in a learning program primarily designed to de- 
velop a more thorough understanding of social 
causality and human motivation, also developa 
more thorough understanding of the factors that 








operate in the physical world and that help to ex- 
plain natural phenomena. 

In analogy to the definition of social causality, 
the concept of physical causality might be defined 
as: 


1. An awareness of some of the forces operat- 
ing in the physical world, and the common factors 
that cause an event or a phenomenon in the physi- 
cal world, 

2. An awareness thatthere are frequently mul- 
tiple factors ora complexity of factors involved in 
a given event or phenomenon in the physical world, 

3. An awareness of some of the common scien- 
tific procedures and methods of fact finding, 

4. An awareness that knowledge of a given fac- 
tor, force or dimensionas represented by a meas- 
urement of it, is limited since any measure is only 
an approximation, 

5. An awareness of the fact that if knowledge is 
based on approximations, predictions based on this 
knowledge are also only approximate, 

6. An awareness Of the fact that scientific 
knowledge grows and may change as new informa- 
tion is accumulated. 


Hypothesis 


Natural science teaching has not given much at- 
tention to the development of thinking in probabil- 
ity terms. Ina study by Fitzgerald and Ojemann 
no correlation was found between scores on a prob- 
ability concepts test and number of semesters of 
high-school natural science study. The authors 
conclude ‘‘...no relationship between an under- 
standing of the dynamic conception of knowledge as 
measured by the test and number of semesters of 
science is indicated.’’(3) In view of this finding 


* The data for this study were collected while the author was on the staff of the Preventive Psychiatry 
Program, State University of lowa. The preparation of this paper was supported by Goucher College 
and the Grant Foundation. 

**The author is greatly indebted to Dr. R. H. Ojemann for valuable advice in the preparation of the man- 
uscript, and to Dr. B. Snider for assistance in the statistical analysis and IBM processing. 
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and since the experimental causal learning pro- 
gram attempts to develop a functional conception 
of probability, one might expect some transfer 
effect from the social causality learning program 
to probability thinking as applied to natural sci- 
ence problems. One might also expect some re- 
inforcement in considering the complexity of fac- 
tors andinthe understanding of scientific proce- 
dures and methods of fact finding which natural 
science teaching has be gun to include in its ele- 
mentary school teaching content. 

Thus, the major hypotheses of this study are: 


1. Subjects who have participated in an exper- 
imental learning program designed to develop an 
understanding and appreciation of the forces that 
operate in human behavior will be more capable 
of solving problems involving an understanding 
of ‘‘causality’’ and ‘‘probability’’ than control 
subjects who have not had special training in these 
concepts. 

2. It is further hy pot hes ized that there is a 
transfer effect in that anindividual who has 
learned to solve social problems dynamically will 
also approach phenomena and events in the phys- 
ical world witha causal attitude, asking such 
questions as: How did this event develop? What 
are the factors involved? How accurate are our 
measurements? And, what are the probable con- 
sequences ? 


The data gathered also make possible the test- 
ing of some additional hypotheses: 


3. Studies have shown (15, 19) that the environ- 
ment in which children grow up contains several 
influences which teach a non-causal orientation 
toward behavior, while the predominant attitude 
toward natural science is a causal one. E is thus 
hypothesized that measurements of intelligence 
correlate less with measurements of social caus- 
ality than with measurements of physical science 
knowledge. 

4. Since intelligence facilitates problem-solv- 
ing, the measurements of causality will be corre- 
lated to intelligence. Brighter children usually 
have a better knowledge of the antecedent factors 
that bring about an event; they have a better un- 
derstanding of the probabilistic nature of knowl- 
edge and the changes in knowledge. It is hypoth- 
esized that the correlation bet ween intelligence 
and measurements of causality will be greater for 
control Ss who have not hada special training 
than for experimental Ss. Sensitivity to the fac- 
tors that bring about an event, even though influ- 
enced by intelligence, is greatly dependent on 
other factors, especially those learning experi- 
ences which are provided in the experimental 
causal learning program. 

5. Since it is common knowledge that increase 
in age, signifying accumulation of experience, is 





an important variable in concept formation (21), 
it is hy pothes ized that there will be a develop- 
mental increase in scores on the measures of 
causality from fifth to sixth grade. The results 
would be confounded by at least two factors, age 
and special teaching methods for the experiment- 
al Ss. Therefore, the direct developmental in- 
crease ought to be demonstrated on the control Ss 
only. In our culture a causal approach is taught 
toward the natural science phenomena. On the 
other hand, there is evidence (15) that magazines 
and newspapers fail to make an analytic or dynam- 
ic approach to the problems of de ve lopment and 
neglect to describe effective methods of dealing 
with differential causes of behavior and variabil- 
ity in the rate of development. Furthermore, it 
has been shown (19) that the social studies read- 
ers whichare used in the elementary school place 
emphasis on the external forms of behavior rather 
than the dynamics underlying it. Consequently, 
one might further hypothesize that for the control 
Ss the developmental increase in physical causal- 
ity will be greater than in social causality. 


The rationale for these hypotheses is based on 
the assumption that there is no basic dichotomy 
between social and natural phenomena. The caus- 
al question ‘‘Why does a given phenomenon hap- 
pen?”’ is involved in both areas. Furthermore, 
the probabilistic nature of knowledge, the aware- 
ness that knowledge changes and expands, is rec- 
ognized inthe natural as well as in the social sci- 
ences. The inaccuracy of measurements and of 
predictions is equally applicable to both social and 
natural phenomena, even though the error term is 
likely to be greater in measurements used by the 
social scientist. The basic principles involved in 
an understanding of cause-effect relationships are 
equally applicable inthe physical and in the social 
world. Consequently, there is justification in hy- 
pothesizing that an ex pe rime ntal teaching pro- 
gram which is designed to teach a basic under- 
standing of principles of social causality will have 
a transfer effect on phenomena generally consid- 
ered as part of the physical world. 


The Test Instruments 





The following test instruments were used to 
evaluate the Ss’ knowledge and understanding in 
the area of social and physical causality: 


1. The Elementary Causal Test (ECT) which 
consists of 30 items of the true-false type. The 
test measures the child’s awareness of the dynam- 
ic, complex, variable nature of human behavior 
(16). The test has a Kuder-Richardson reliability 
of . 63 and a correlation with intelligence of -. 29 
(N = 245) for sixth and -. 37 (N = 158) for fifth- 
grade Ss. 

2. The Problem Situations Test (PST) which 





consists of 22 multiple choice items describing 
interaction between peers, siblings, parents, and 
teachers. The test measures the child’s willing- 
ness to be immediately punitive in a hypothetical 
situation where no retaliation is anticipated. The 
situations involve aggressive feelings, moral 
transgressions and personal problems (7, 9, 16). 
The test has a Kuder-Richardson reliability of 
. 717 anda correlation with intelligence of -. 12 (N = 
245) for sixth-grade, and -.12 (N= 158) for 
fifth-grade Ss. 


For both the ECT and the PST the lower score 
is the more causal score. 


3. In the following year when the study was re- 
peated, with some modifications to be described 
later, the Problem Series II Test (PS III) was 
utilized as a measure of social causality. This 
change in thetest instrument appeared to be nec- 
essary since the experimental Ss approached the 
ceiling of the PST andthe ECT. The Problem 
Series III test consists of 62 true-false items in- 
volving various aspects of the concept of causal- 
ity in social situations. Answers can be classi- 
fied into four levels or degrees of causal under- 
standing: 

1) No concernor awareness as to behavior dy- 

namics 

2) Pseudo-causal approaches 

a. Using mystical or magical explanations 

b. Using over-generalized observation or 
similar untested assumptions (stereo- 
typing, rationalizing) 

3) Recognizing one, or a limited number of ob- 

servable factors 

4) Recognizing a complex causation, search- 

ing for the most probable hypothesis (17). 

The problem Series III test has a Kuder-Richard- 
son reliability of .73. Its correlation with intel- 
ligence is -. 25 (N = 113) for fifth-grade Ss. The 
test-retest reliability with one school year inter- 
vening (fall 1958 to spring 1959) is .49 (N = 113) 
for fifth-grade Ss; broken down for experimental 
and control groups the respective. test-retest re- 
liabilities are .59 (N = 51) and . 56 (N = 62). 

4. The Physical Causal Test (PCT) consists of 
60 multiple choice items and was designed by the 
author to measure a subject’s understanding of 
physical causality as defined above. The test has 
a Kuder-Richardson reliability of . 80 and a cor- 
relation with intelligence of .58 (N = 245). The 
test is administered in two parts and subscores 
can be utilized fromeachofthe two parts. Phys- 
ical Causal Test Part I(PCT I) consists of 38 items 
and mainly measures knowledge of causal explan- 
ations of common phenomena and events in the 
physical world. The test involves predominantly 
the first two points of our definition on page 231. 
The Kuder- Richardson reliability of this subscore 
is .76. The Physical Causal Test Part II (PCT I) 





233 


consists ofthe remaining 22 items. Many of these 
items were adopted by the author from Clark’s 
doctoral dissertation (1) for use with fifth- and 
sixth-grade Ss. Thetest in line with the defini- 
tion on page 231 involves concepts three to six: 
scientific methods, the approximate nature of 
measurements, the approximate nature of predic- 
tions and an awareness of the changing nature of 
knowledge. While Part I is predominantly con- 
cerned with knowledge of phenomena and events in 
the physical world, Part Il emphasizes the prob- 
alistic nature of scientific knowledge. Part II, 
due to its shortness, has a Kuder- Richardson re- 
liability of . 66. 

5. In the succeeding year when the study was 
repeated this Physical Causal Test was revised. 
The total test of 60 items was divided into two 
forms, Form Aand Form B, of 30 items each. 
The content of the items, the degree of difficulty 
and the discriminatory power of the items used in 
Form A and Form B were as similar as feasible, 
so that two equivalent forms of the test were avail- 
able. The test-retest reliability of the two test 
forms over a two-week interval is . 74 (N = 72). 
The correlation with intelligence is . 45 (N = 116) 
for Form A and . 44 (N = 116) for Form B. 


Since the Physical Causal Test has not been de- 
scribed in the literature, asample item will be 
given for each of the six concepts of the definition 
of physical causality. 


1. An awareness of factors that cause a phe- 
nomenon. 
What could you do to make the water in the air 
form in drops on the outside of a glass? 
A. Setaglass of ice water in a very warm 
room. 
. Set a glass of ice water in the refrigerator. 
. Set a glass of warm water in a very warm 
room. 
. Set an empty glass in a very warm room. 
. Set a glass of hot water in a very cold room. 


. An awareness of the multiplicity of factors 
that cause a phenomenon. 
Not all people require the same number of food 
calories or food energies each day because: 
A. They do not like to eat the same thing. 
B. No two people do the same kind of work. 
C. Some days a personis hungrier than on 
other days. 
D. 


People burnupdifferent amounts of calories 
depending whether they area boy or girl and 
where they are. 

. People burn up a different amount of calor- 
ies depending on what they are doing and how 
much they weigh. 


. An awareness of common scientific procedures 
and methods of fact finding. 
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A party of explorers made a trip to northern 
Alaska to explore that country. Though they 
lived there for a month in freezing and snowy 
weather without enough warm clothing, none of 
of the men had any colds. Then they received 
a shipment of warmclothing. A few days after 
opening the boxes of clothing, more than half 
the men came down with bad colds. If you 
were a scientist interested in the causes of 
colds, what conclusion would you draw from 
this situation? 

A. It proves that colds are caused by a germ 
that can be carried in clothing. 

B. It shows that it might be a good idea to 
make some experiments to see if cold germs 
can be carried in clothing. 

. It proves that cold, dampness and not enough 
clothing will eventually, if not right away, 
cause colds. 

. It shows that bad weather and not enough 
clothing are not causes of colds. 

. It does not have any meaning for a scientist 
because it was not a scientifically made ex- 
periment. 


. An awareness that measures are only approx- 
imations. 
Two men were measuring the strength of the 
grips of the pupils in a sixth-grade class. 
Each pupil had three trials on Monday and an- 
other three on Tuesday. The conditions were 
about the same both days. For some of the 
children the average on Tuesday was as much 
as two pounds higher than the day before and 
for several others the average weight was as 
much as two pounds lower. With which state- 
ment do you agree? 

A. Examiners can always read the scale ex- 
actly right but children’s strength of grip 
isn’t always the same. 

. Some differences should be expected from 
one day to the next. 

. They didn’t get good measurements of those 
who weren’t the same both days. 

. They got exact measurements for those who 
were the same both days. 

=. The children had the same strength both 
days but the examiners read the scale care- 
lessly. 


. An awareness that predictions are only approx- 
imations. 

A well-known manufacturer of car tires pro- 
duced anew tire. After many tests, he con- 
cluded the tire could be driven for at least 
40, 000 miles before it would be worn out. He 
put the tire onthe market with a guarantee that 
the tire would last for that mileage, unless 
there were accidents and unusual driving 
conditions. Jane’s father traded in all his old 
tires on a purchase of these new tires. What 





statement about his tires would you say would 

be most likely true after they had been driven 

39, 000 miles? 

A. Probably all of these tires will be useable 
at 39,000 miles if there were no accidents 
or unusual driving conditions, but it is cer- 
tain that at least one of them will still be 
useable. 

. If there were no accidents or unusual driving 
conditions, all of these tires will still be 
useable at 39, 000 miles. 

. If there were no accidents and unusual driv- 
ing conditions, it is likely that these tires 
will still be useable at 39,000 miles, but it 
is also possible that all of them will be worn 
out. 

. If there were no accidents or unusual driv- 
ing conditions, it is likely that some of the 
tires will still be useable at 39,000 miles, 
but it is certain that at least one of them 
will be worn out. 

. Probably all of these tires will be worn out 
at 39,000 miles because tires very seldom 
last that long. The guarantee wouldn’t mean 
‘much because it would be hard to prove that 
there had been no accidents or unusual driv- 
ing conditions. 


. Anawareness of the change and increase in 


knowledge. 

A student who wanted to become a farmer was 

studying in school how to keep a farm so it will 

grow good crops. He studied his book thor- 
oughly and took good notes on the lectures so 
that he could always use this knowledge in keep- 
ing the soil of his farm fertile. Thetextbook was 
the best the teacher could get at the time and 
he presented informationaccurately. After the 
student finished school, four years passed be- 
fore he becamea manager of a farm. When he 

did become a manager, he applied to his farm 

all the suggestions given in the course in soil 

fertility and followed them very carefully. He 
has been doing this forthe last five years. With 
which statement do you agree? 

A. It was a good idea to follow very carefully 
what he had learned in the class in soil fer- 
tility. 

. His way could be wrong because it would be 
old and out-of-date. 

. He should check upand see about new infor- 
mation about soil fertility that may have 
come out. 

. It would have beenbetter to get a brand new 
book on soil fertility and have followed it 
exactly because new ways are best. 

. It would be a good idea to find out about any 
changes in knowledge because in nine years 
there are sure tobe improvements in every 
field. 





Procedures 

The study involves two independently obtained 
sets ofdata, the first involved 413 fifth- and sixth- 
grade Ss during the 1957-58 school year; the sec- 
ond involved 113 fifth-grade Ss in the 1958-59 
school year. The procedures for both aspects of 
the study will be described separately. 

In the spring of 1958, the following tests were 
administered to 158 fifth- and 255 sixth-grade Ss 
in a midwestern community of 80, 000: 


1. The Elementary Causal Test (N = 403) 
2. The Problem Situations Test (N = 403) 
3. The Physical Causal Test (N = 413) 


Of these 413 Ss, 208 had participated in an ex- 
perimental learning program designed to develop 
a causal orientation toward human behavior and 
social problems. The remaining 205 Ss came 
from regular classrooms of the same school sys- 
tem. An attempt was made to matchthe exper- 
imental classes with the control classes on such 
variables as the teacher’s professional training, 
the teacher’s years of professional experience, 
the teacher’s age, and the socio-economic back- 
ground of the students. 

The causal learning program conducted under 
the auspices ofthe staff of the Preventive Psychi- 
atry Program, State University of Iowa, has been 
in operation for several years. Some aspects of 
the teacher training program and the content ma- 
terial utilizedin this learning program have been 
described elsewhere in the literature (16). 

Since the experimental and the control classes 
differed in their respective mean IQ scores:(sixth- 
grade experimental 111.1 and control 106.1; fifth- 
grade experimental 112.0 and control 106. 4) and 
since the Physical Causal Test has a moderately 
high and significant correlation with IQ (Tables 
III and IV), a simple analysis of covariance de- 
sign was utilized in order to control statistically 
the concomitant variable of intelligence. Asa 
measure of intelligence the Ss’ latest Otis IQ 
scores were obtained from the school’s cumula- 
tive folders. 

The three measures of causality were admin- 
istered at the end of the 1957-58 school year. It 
is assumed that differences which arose between 
experimental and control classes can be attribu- 
ted to the special learning program to which the 
experimental Ss had been exposed. In order to 
test whether this assumption is justified, the ex- 
periment was repeated with 113 fifth-grade Ss in 
the 1958-59 school year. In this second part of 
the study the following tests were administered: 


1. The Problem Series I, administered in 
the fall of 1958 and re-administered in the 
spring of 1959. 

2.The Physical Causal Test Form A in the 





fall of 1958. 
3. The Physical Causal Test Form B in the 
spring of 1959. 


The scores obtained during the administration 
in the fall of 1958 will be referred to as the ‘‘pre- 
test scores’’; the scores obtained in the spring 
of 1959 as the ‘‘ post-test scores’’, and the differ- 
ences between the pre- and post- tests as the 
‘growth scores.’’ All test scores plus the Otis 
IQ from the school’s cumulative folders were avail- 
able for 51 experimental and 62 control Ss. None 
of the experimental Ss had been in the experi- 
mental program before. Since there were no sig- 
nificant differences between IQ scores (Table VI) 
the pre-test scores and growth scores were ana- 
lyzed by way of t tests. 


Results and Discussion of Findings 





The data pertaining to the first and the second 
hypotheses comparing experimental and control Ss 
on the criterion variables are reported in Table I 
for the sixth-grade Ss and in Table II for the fifth- 
grade Ss. Table I reports the analysis of covar- 
iance for 245 sixth-grade Ss for the ECT and the 
PST and also for 255 sixth-grade Ss for the PCT. 
Table II presents the corresponding data for 158 
fifth-grade Ss. Tests for homogeneity of variance 
for all six sets of data in Table I and Table II as 
weil as the data reported in Table VIII are evi- 
dence that we are-justified in the necessary as- 
sumption of homogeneity of variance. 

The data obtained from the two measures of so- 
cial causality basically support the first hypothe- 
sis. Forthe Elementary Causal Test the F ratios 
are 19. 89 for the sixth-grade Ss (Table I) and 32.73 
for fifth-grade Ss (Table II). These differences 
are highly significant and give substantial support 
to the hy pot hesis that on the Elementary Causal 
Test the experimental Ss respond significantly 
more causally than the control Ss. 

The data from the Problem Situations Test are 
not quite as convincing, the F ratio for sixth-grade 
Ss is 1.94, not significant (Table I), and 22. 78, 
significant atthe . 001 level for fifth-grade Ss (Ta- 
ble II). It can be stated that on the Problem Situ- 
ations Test fifth-grade experimental Ss respond 
more causally and less punitively than do their 
control Ss. The data does not warrant a similar 
conclusion for the sixth-grade Ss. However, in- 
spection of individual scores indicates that many 
experimental Ss approach and reach the ceiling, 
that is, the lowest possible score of this test. On 
an average, experimental Ss answer non-causally 
only four out of twenty-two items which means that 
a number of Ss answer only zero, one or two 
items non-causally. They have little or no oppor- 
tunity to improve further and to appropriately 
show their knowledge of causality. Consequently 
their knowledge and understanding of causality is 
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not measured adequately by this test. 

The tendency for experimental Ss to reach the 
ceiling of the Problem Situations Test after par- 
ticipating in a causal learning program has been 
found in previous studies (16). This fact made it 
necessary to develop a new test instrument for 
the measurement of social causality. The Prob- 
lem Series III was developed as a result of this 
finding and used inthe follow-up study in 1958-59. 
If one considers this particular weakness of the 
PST which becomes most obvious for sixth-grade 
Ss, the data substantially support the hypothesis 
that experimental and control Ss do differ in their 
responses to these two measures of social caus- 
ality. The experimental Ss respond more caus- 
ally and less punitively. Since intelligence is con- 
trolled by the method of covariance it is assumed, 
in agreement with other studies, that the obtained 
differences can be attributed to the learning pro- 
gram. 

The data for the second hypothesis are also 
reported in Table I and Table I. As a measure 
f transfer, the Physical Causal Test, which is 
relatively unrelatedtothe learning program in its 
content but is related in the method of problem 
solving, was utilized. The F ratios are 4. 13, 
significant at the . 05 level for sixth-grade Ss, and 
6.91, significant at the .01 level for fifth-grade 
Ss. Thesedata demonstrate convincingly that the 
experimental Ss hada better understanding of fac- 
tors that bring about an event in the physical 
world and were more aware of the probabilistic 
nature of knowledge than their control Ss. It is 
inferred that this difference is due to a transfer 
effect from the experimental learning program. 
Judd’s theory of transfer of training (5) states that 
learning transfers through the understanding of 
underlying principles andthe formation of gener- 
alizations. One might argue that some of the 
basic principles involved in the sol ving of these 
test problems are analogous to the concepts taught 
in the experimental learning program but applied 
in the PCT test to a different content. 

The coefficients of correlation pertaining to the 
third hy pot hesis can be found in Tables II, IV, 
and VII. As a word of explanation, the reader is 
reminded that the measures of social causality— 
the ECT, the PST, and PS II—are scored in such 
a way thatthe lower score is the more causal 
score, while on the PCT the higher score is the 
more causal score. This fact results in negative 
correlations between social and phy sical causal 
scores and between social causal scores and IQ. 
Consequently, it is the magnitude of the correla- 
tion and not its direction that matters in the inter- 
pretation of findings. 

For sixth-grade Ss (N = 245) the PCT/IQ cor- 
relation is .58, on the other hand, the PST/IQ 
correlation is only -. 12, and the ECT/IQ corre- 
lation is -.29. Disregarding the negative sign, 
for reasons explained above, Fisher’s z test of 








differences between r’s indicates in support of the 
third hypothesis that the PCT/IQ correlation is 
signifcantly greater than both the ECT/IQ and 
the PST/IQ correlations. 

The correlations for fifth-grade Ss (N = 158) 
are quite similar in magnitude to those for sixth- 
grade Ss—PCT/IQ is .59, PST/IQ is -.12, and 
ECT/IQ is -.37. Fisher’s z test of differences 
between r’s againevidences that the PCT/IQ cor- 
relation is significantly greater than both the ECT 
/1IQ and the PST/IQ correlations. The hypothesis 
that measures of social causality are not as high- 
ly correlated to intelligence as those of physical 
causality is substantiated by the 1957-58 data. 

In taking a preview of the 1958-59 data in re- 
spect to this hypothesis (Table VII), we observe 
that the PCT Form A/IQ corr elation obtained in 
the fall of 1958 is . 45 (N = 116) andthe Form B, 
IQ correlation obtained in the spring of 1959 is 
.44 (N = 116). The PS I/IQ correlation is -. 36 
for the fallof 1958 and -. 25 for the spring of 1959 
administrations. The differences between the two 
spring correlations approach the . 10 level of sig- 
nificance. These data do not repudiate the 1957- 
58 findings. 

It appears that the solving of natural science 
problems is more enhanced by intellectual abil- 
ity than is the solving of problems involving so- 
cial conflicts, motivation of behavior, and a will- 
ingness tounderstandothers. There are two pos- 
sible explanations for this finding: a) The under- 
standing of physical science phenomena is taught 
as an intellectual subject and items similar to 
those ofthe PCT are not uncommon in intelligence 
tests. The ability to understand the behavior of 
others is often considered to be related to one’s 
own emotional make-up and one’s personality 
rather than to one’s intellect. b) Ananalytic 
and causal approach to natural phenomena is part 
of our intellectual culture and expectations, while 
many segments ofour culture are more concerned 
with the overt forms of behavior rather than its 
underlying dynamics and motives. 

It was assumed in hypothesis four that the 
measures of causality showa positive correlation 
with intelligence. This assumption is justified 
since the respective correlations are significant 
for the ECT andthe PCT but not for the PST (Ta- 
bles III and [V). Forthe 1958-59 data, all respec- 
tive correlations are significant. 

The second part of the fourth hypothesis re- 
lates to differences between ex perimental and 
control Ss intheir respective correlations between 
IQ and the criterion variables. In comparing the 
IQ/criterion variables correlations between the 
experimental andthe control Ss, one observes, in 
agreement with the hypothesis, that five out of six 
(Tables [I and IV) are in the predicted direction. 
The PST/IQ correlation for fifth-grade Ss consti- 
tutes the only exception. However, only in oneof 
five instances, namely the PST/IQ correlation for 





TABLE Ol 


INTERCORRELATIONS OF THE CRITERION VARIABLES AND IQ FOR 
SIXTH-GRADE SUBJECTS (1957-58 DATA) 








Elementary Problem Physical 
Causal Test Situations Test Causal Test 





Intelligence Quotient 
Sixth-Grade Ss 
Experimental Ss 
Control Ss 


Elementary Causal Test 
Sixth-Grade Ss 
Experimental Ss 
Control Ss 


Problem Situations Test 
Sixth-Grade Ss 
Experimental Ss 
Control Ss 


* Significant at the .05 level __ 


**Significant at the . 01 level 





TABLE IV 


INTE RCORRELATIONS OF THE CRITERION VARIABLES AND IQ FOR 
FIFTH-GRADE SUBJECTS (1957-58 DATA) 








Elementary Problem Physical 
Causal Test Situations Test Causal Test 








Intelligence Quotient 
Fifth-Grade Ss 
Experimental Ss 
Control Ss 


Elementary Causal Test 
Fifth-Grade Ss 
Experimental Ss 
Control Ss 


Problem Situations Test 
Fifth-Grade Ss 
Experimental Ss 
Control Ss 





* Significant at the . 05 level 
**Significant at the . 01 level 
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sixth-grade Ss, does the difference approach the 
05 level of significance. Anticipating the 1958- 
59 data (Table VII) one observes that in the fall 
administration the PCT/IQ correlations of the ex- 
perimental andthose of the control groups are not 
significantly different. In the spring administra- 
tion, after the ex perimental Ss had been in the 
program for nine months, there is a significant 
difference; control Ss show a significantly higher 
PCT/IQ correlation than experimental Ss, thus 
supporting the hypothesis. The respective data 
for the PS III demonstrate a significant difference 
between ex perimental (r = -. 13) and control (r = 
-.52)Ss forthe fall, while the difference for the 
spring administration (r = -. 12 experimental; r = 
-.40 control) approaches the .10 level of signifi- 
cance. 

Summarizing the discussion relating to the 
fourth hy pothesis, differences between experi- 
mental and control Ss are in the predicted direc- 
tion; however, they are not consistently sig nifi- 
cant, and, therefore, give only tentative support 
for the hypothesis. Thus while it is generally true 
that the more intelligent Ss have a better under- 
standing of cause-effect relationships, the prob- 
abilistic nature of knowledge, etc., this relation- 
ship, as was hypothesized, appears to be more 
pronounced for Ss who have not participated in an 
experimental learning program. The data, es- 
pecially from the 1958-59 study, suggest that the 
experimental learning program reduces the rela- 
tionship between intelligence and causal under- 
standing. This finding is in agreement with the 
data reported in Tables I and II, namely that the 
causal learning program increases a S’s aware- 
ness of what has been defined as ‘causality’ re- 
gardless of his intelligence. There even appears 
to be a negative though non-significant relationship 
between intelligence and the growth score on the 
Physical Causal Test forthe experimental Ss (r = 
-.15, N = 51). 

The fifth hypothesis, supported by various de- 
velopme ntal] studies on causality (2, 4, 6, 18,21), 
states that childrenas they grow older become 
more capable of understanding the physical and the 
social world in which they live due to their accu- 
mulated knowledge and experience. In order to 
test this hypothesis, t tests were computed on the 
criterion variables between the fifth- and sixth- 
grade control Ss. In this instance, the t test suf- 
fices since there are no statistically significant 
differences between the fifth- and sixth-grade Ss 
in their mean Otis intelligence scores (Table V). 
No further importance will be attached to this since 
there would be—for obvious reasons—differences 
if the mental age had been used instead of the IQ. 

Table V reports the data for 201 control Ss. In 
support ofthe hypothesis there are significant dif- 
ferences on the Physical Causal Test. An under- 
standing of causality as measured by the PCT is 
enhanced by an increase in age and experience. 





There also is a significant difference on the Prob- 
lem Situations Test, while the difference on the 
Elementary Causal Test approaches the . 05 level 
of significance. Consequently, the data from the 
three tests of causality support the hypothesis that 
sixth-grade Ss have a better understanding of 
causality than fifth-grade Ss. 

The data pertaining to the second part of the 
hypothesis are not conclusive, however, it is sug- 
gested by the magnitude ofthe t ratios that devel- 
epmental increase in causality is greater on the 
PCT than on the ECT for which the t ratio is not 
quite significant. ‘ 


Modification of the Study 





The question arises as to what extent the ex- 
perimental and control Ss differedoriginally, and 
whether the obtained differences—as was as- 
sumed—are due to the causal learning program 
and not to some other factors. Since, as was 
shown, intelligence is an influential factor, its 
effect was eliminated by way of the statistical 
method of covariance. 

In order to further support our basic hypothesis 
and to control the effect of the experimental learn- 
ing program more effectively, the study with some 
modification was repeated the following school 
year with fifth-grade Ss. Since there are no sig- 
nificant differences in intelligence between the 
experimental and the control Ss, the data in Table 
VI are reported by way of t tests between group 
means. In the 1958-59 administration the Physi- 
cal Causal Test Form A was administered in the 
fall of 1958 and the Form B in the spring of 1959. 
As a measure of social causality the Problem 
Series IIItest was administered in the fall of 1958 
and again in the spring of 1959. This makes it 
possible to compare the experimental and the con- 
trol Ss at the beginning of the school year and their 
growth of causality duringthe school year. Table 
VIsupplies the 1958-59 data for the 113 fifth- 
grade Ss. At the beginning of the fall 1958 school 
year, there were no significant differences on the 
pre-test for the PS III and the PCT; neither were 
there significant differences inintelligence. Con- 
sequently, there is rather conclusive evidence for 
the assumption that the ex perimental Ss did not 
differ from the control Ss at the beginning of the 
1958-59 school year on the criterion variables 
under investigation. 

The effect of the experimental learning pro- 
gram was measured in terms of growth scores 
on the PCT and the PS II. 

Even though the growth score was utilized and 
not the terminal score, asinthe 1957-58 analysis, 
findings are identical. Inagreement with our first 
hypothesis, the data indicate that even though the 
two groups do not differ significantly on the pre- 
test of the Problem Series III, the experimental 
Ss show significantly more growth, t ratio 5. 75 
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(significant at the .001 level), on this measure of 
social causality than the control Ss. Further- 
more, insupport of the second hypothesis one ob- 
serves that even though both groups do not differ 
significantly on the Physical Causal Test at the 
time of the pre-test, the experimental Ss show a 
significantly greater mean gain during the two 
test administrations than do the control Ss, thus 
again supporting the hypothesis concerning a 
transfer effect from the experimental learning 
program to a measure of physical causality. 

In conclusion, the data from the 1958-59 study 
give additional support to the hypothesis that ex- 
perimental Ss grow more rapidly in their under- 
standing of causalitythancontrolSs. Since it was 
shown that the two groups did not differ signifi- 
cantly on the criterion variables at the beginning 
of the experiment, it can be assumed that the 
growth in causality is a product of the experi- 
mental causal learning program. Furthermore, 
there is justification in concluding that the exper- 
imental learning program with its emphasis on 
dynamics of behavior and human motivation does 
contribute to a transfer effect which enhances the 
experimental Ss’ performance in solving problems 
involving an understanding of common scientific 
phenomena, an awareness of the multiplicity of 
factors that may be involved in bringing about an 
event, an awareness that knowledge may change, 
an understanding of error terms in measurement 
and prediction, and an appreciation of the prob- 
abilistic nature of scientific knowledge. 

The findings of a previous study (11) suggest a 
further problem. Is the transfer effect as meas- 
ured by the Physical Causal Test greater for the 
kind of items that are presentedin PCT Part I in- 
volving knowledge of common phenomena and 
events in the physical world or is it greater for 
the items in PCT Part Il which emphasize the 
probabilistic nature ofknowledge? Furthermore, 
if there is a differential transfer effect, is it the 
same for fifth- and for sixth-grade Ss? 

In order to investigate this question the scores 
of the Physical Causal Test reported in Table I 
and Table II were broken down into their sub- 
scores, PCT Part land PCT Part Il. In Table 
VIII these subscores are analyzed by a covariance 
design in order to eliminate the influence of IQ 
and are reported separately for fifth- and sixth- 
grade Ss. Thedata for sixth-grade Ss, in agree- 
ment with previous findings (11), demonstrate 
that thereis no significant difference between ex- 
perimental and control Ss on the PCT Part I. On 
the other hand, the F ratio for the PCT Part Il is 
12.36, significant at the . 001 level. Experiment- 
al sixth-grade Ss have a significantly better un- 
derstanding of the probabilistic nature of knowl- 
edge than do control Ss, but do not differ in the 
knowledge of common phenomena and events in 
the physical world as measured by these tests. 

For fifth-grade Ss the findings are reversed. 





ThePCT Part I scores yield an F ratio of 11. 68, 
significant at the . 001 level, indicating that even 
though there is no systematic difference between 
sixth-grade experimental and control Ss, sucha 
difference exists for fifth-grade Ss. On the other 
hand, while the difference was highly significant 
for sixth-grade Ss onthe PCT Part I, there is no 
significant difference on this subscore for fifth- 
grade Ss. 

It appears that there is a treatment and grade 
level interaction effect, which only becomes ob- 
vious when the total Physical Causal Test score 
is broken down into itstwosubscores. Fifth- 
grade experimental Ss are better able to solve the 
kind of problems that make up the PCT Part I, 
while sixth-grade experimental Ss solve the prob- 
lems ofthe PCT Part II more effectively than their 
respective control Ss. 

The differential findings onthe PCT Part II can 
be explained fairly easily by the concepts involved. 
The items making upthe PCT Part II are relative- 
ly difficult and complex. Many of these items 
were originally developed by Clark (1) for use with 
junior high-school Ss and were adopted by the au- 
thor for use with fifth- and sixth-gradeSs. While 
the adaptation for sixth-grade Ss appears to have 
resulted in an appropriate degree of difficulty, the 
items do seem to be fairly difficult for both exper- 
imental and control fifth-gradeSs, and the exper- 
imental learning effect does not show on these 
items. Furthermore, the particular content of the 
experimental learning program may contribute to 
the findings onthe PCT Part Il, for there isa 
greater emphasis in the sixth-grade material on 
concepts involving probability, change of knowl- 
edge, andthe inaccuracy of measurement and pre- 
dictions. 

The findings pertaining to the PCT Part I are 
more difficult to explain. It appears that for the 
fifth-grade experimental Ss there is a real learn- 
ing and/or transfer effect whichincreases the PCT 
Part I subscore and whichis assumed to be due to 
the experimental learning program. Why this is 
not the case for sixth-grade Ss is not obvious. 
Perhaps the PCT Part I items are too direct, 
many items ask for relatively simple explanations 
and at times only for one or two factors that may 
have caused anevent. The sixth-grade experi- 
mental Ss might have reached a level of sophisti- 
cation in their causal thinking, particularly in re- 
spect to the multiplicity of causes and the inter- 
acting nature of factors that bring about an event, 
so that they were confused by the rather direct ex- 
planations asked for in the PCT Part I. Also the 
sixth-grade controlteacher, with the post-Sputnik 
emphasis on natural science teaching in the cur- 
riculum, may put greater emphasis on science 
teaching and since the concepts covered in PCT 
Part I would be taught in ele mentary school in a 
conventional physical science class, such science 
teaching would indeed helpa student in solving the 
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kind of problems which appear in PCT Part I but 
not thosein PCT Part Il. These explanations are 
speculative, and there may be others. 

Nevertheless, it is obvious that the nature of 
the items in PCT Part lis basically different 
from those in Part II. This is also indicated by 
the moderately high correlation bet ween these 
two parts of the Physical Causal Test. The cor- 
relation between the PCT Part I and PCT Part I 
subscores is . 42 (N = 158) for fifth-grade Ss and 
.44 (N = 245) for sixth-grade Ss, indicating that 
the two parts of the PCT measure t wo different 
aspects of the S’s understanding of his physical 
environment, since the amount of variance that 
can be accounted for bythe correlation of the two 
measures is 20 percent. 

The question arises when the 1957-58 study 
was repeated for fifth-grade Ss in 1958-59 whether 
the findings were duplicated. Inspection of Table 
IX supports the findings from the preceeding year, 
even though growth measures are utilized. On 
the pre-test scores of both PCT Part I and PCT 
Part II there are no significant differences be- 
tween experimental andcontrolSs. On the growth 
measure, obtained by subtracting the pre-score 
from the post-score, there is a significant dif- 
ference on PCT Part I but not on PCT Part ZL. 
This is an exact duplication of the 1957-58 data, 
even though the revised PCT test contained only 
half as many Part land Part items. The as- 
sum tion that the increase in score is due to the 
experimental learning program is substantiated 
by the fact that in the 1958-59 data there existed 
no Known differences on the criterion variables at 
the beginning of the experiment. 

These findings show that onthesetwo subtests 
of the Physical Causal Test the learning effect of 
the experimental program affects fifth- and sixth- 
grade Ss differently. Further investigation is 
needed to satisfactorily explain these findings. 


Summary 


Experimental (N = 208) and control (N = 205) 
Ss from fifth- and sixth-grade classes were com- 
pared in respect to their mean scores on meas- 
urements of social and physical causality in order 
to determine the effect and the transfer effect of 
a mental health learning program which is de- 
signed to developan understanding and apprecia- 
tion of the motives of human behavior and an 
awareness ofthe dynamics of factors operating in 
social situations. Data were analyzed using a co- 
variance design in order to eliminate the influ- 
ence of IQ on the findings. 

The findings of the study allow for the follow- 
ing generalizations: 





. The experimental learning program sub- 


stantially contributes to an understanding 
of the dynamic, complex, and variable na- 
ture of human be havior in fifth- and sixth- 
grade Ss as measured by the ECT. 


. The learning program reduces a fifth-grade 


S’s willingness to be punitive as measured 
by the PST. Whether it does so for sixth- 
grade Ss cannot be assessed clearly due to 
certain limitations of the test instrument. 


. The learning program has a transfer effect 


for fifth- and sixth-grade Ss on an under- 
standing of the causes of common events 
and phenomena in the physical world and an 
awareness of the probabilistic nature of 
knowledge as measured by the PCT. 


. The transfer effect is different for fifth- and 


sixth-grade Ss. Fifth-grade experimental 
Ss have abetter understanding of the factors 
that produce a common event, as measured 
by the PCT Part I, than their control Ss. 
Sixth-grade ex perimental Ss have a better 
understanding of the probabilistic nature of 
knowledge, as measured by the PCT Part I, 
than their respective control Ss. Further 
investigation is needed to explain this phe- 
nomenon. 


. Measures of physical causality have a high- 


er correlation withIQ than measures of so- 
cial causality. 


. There is a tendency for experimental Ss to 


obtain lower correlations bet weenthe meas- 
ures of causality and IQ than is the case for 
control Ss. It appears that the learning pro- 
gram increases a S’s understanding of so- 
cial and physical causality regardless of the 
S’s level of intellectual functioning. 


. There is adevelopmental increase in under- 


standingof causality as measured by the 
ECT, PST and PCT from fifth- to sixth- 
grade forcontrolSs. This increase appears 
to be greater in the physical than in the so- 
cial science area. 


. Using fifth-grade Ss and modified test in- 


struments inthe subsequent year, most find- 
ings were duplicated. Since it could be dem- 
onstrated that no known differences existed 
on the criterion variables at the beginning 
of the experiment, the follow-up study pro- 
vides substantial evidence for the assump- 
tion that these effects are due to the exper- 
imental learning program. 


. While one cannot draw conclusions beyond 


this particular experimental learning pro- 
gram and beyond the specific test instru- 
ments utilized inthis study, the findings are 
supported in general by other studies con- 
ducted as part of the Preventive Psychiatry 
Program (10, 13, 14, 16, 20). 
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A COMPARISON OF SCORES OBTAINED 
BY ADMINISTERING A TEST NORMALLY 
AND VISUALLY * 


H. A. CURTIS and R. P. KROPP 
Florida State University 


THIS IS A REPORT of the first of a series of 
studies designed, in part, to ascertain the feasi- 
bility of administering via television group stan- 
dardized tests which are com monl]1y used in the 
public schools. Since an increasing amount of in- 
struction is executed through the medium of tele- 
vision, it is relevant to inquire whether testing 
might also be done by television in such a way as 
to obtain, on the one hand, the same results for 
students as would be obtained had they been tested 
under normal conditions, and, on the other hand, 
to obtain more valid results than are afforded by 
the usual method of administration. Also, itis 
relevant to explore the possibilities afforded by 
television with respect to gaining test data that 
simply cannot be obtained by the traditional pa- 
per-and-pencil tests due to their inherent limita- 
tions. This report deals only with a description 
of attempts to gain by television administration 
test results which are comparable to those obtain- 
ed under normal administrative conditions. 

An examination of television as a device by 
which to communicate test materials revealeda 
number of problems. The most notable of these 
problems is the precise method by which the test 
might be presented within the limiting conditions 
imposed by the nature of television. When a test 
is administered under normal conditions, it is 
usual to inform the examinee of the total time al- 
lowed and then to present him with all the test i- 
tems. He is free to determine the amount of time 
he will spend on each item, to select items he 
will attempt, and to re-attempt the items which 
he had previously omitted. But the size of the 
television screen precludes presenting all items 
simultaneously to the student, thus some of the 
freedoms permitted in the normal administration 
of tests will be denied him when the test is pre- 
sented by television. Consequently, a substan- 
tially different test administration technique must 
be used for television testing. 





*Footnotes will be found at the end of the article. 





The technique which is reported here is called 
**pacing.’’ It involves the serial presentation of 
one test item or a small group of items at a time 
until all items in the test have been exhausted. It 
does not allow the examinee to refer back to items 
which have previously been exposed. A single 
presentation of an item or group of items is re- 
ferred to as an exposure. The duration of the ex- 
posure is called the intra-exposure interval. The 
time elapsing between the end of one exposure and 
the beginning of the next exposure is called the in- 
ter-exposure interval. The serial exposure of 
small portions of tests, with the exposures pos- 
sibly being separated by noticeable intervals of 
time, makes possible presentation of test mater- 
ials by television. However, it is radically dif- 
ferent from normal procedure and it poses new 
variables in the administrative technique which 
must be studied. 

In addition to the variables already mentioned, 
there are at least the following: item-order, i- 
tem-type, audio and/or visual presentation, kind 
of test (intelligence, achievement, adjustment, 
etc.), guessing instructions, etc. 

The possibility of using the pacing technique 
hinges on the possibility of setting values on these 
variables that will have the effect of producing 
test results comparable to those that would have 
been obtained under normal administrative condi- 
tions. The following is a description of a first- 
attempt in this direction. 


The general problem reported on here was a 
study of the changes in test scores of a group of 
subjects who were administered the same test un- 
der three sets of administrative conditions. 
These three sets of conditions are as follows. 
One administration of the test was done in confor - 
mance with normal procedures, i.e. , pupils were 
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given test booklets, answer sheets, etc. This ad- 
ministration is subsequently referred to as the 
‘‘control’’ administration. The other two sets of 
administrative conditions were experimental in 
nature and involved projecting test items ona 
screen as the method of presenting the test to the 
subjects. One of these experimental administra- 
tions consisted of presenting test items one at a 
time to the examinees and it is referred to as the 
‘*single’’ presentation. The other experimental 
administration involved presenting the items three 
at a time and is referred to as the “‘triad”’ ad- 
ministration. More explicit descriptions of the 
experimental conditions appear in the subsequent 
section. 

The following comparisons were made of the 
scores obtained under these three sets of con- 
ditions: means, frequency of response, guessing, 
and the factor patterns of each set of scores. 

Although the over-riding purpose is to deter- 
mine a feasible method of administering tests by 
television, the experimental administrationsof 
the test were not done by television due to the ex- 
pense and awkwardness which it would have en- 
tailed. Items were reproduced on slides and were 
projected on a 3'x 5'screen. Slide format con- 
formed to the format of the TV screen. The il- 
lumination level maintained in the testing room 
was comparable to that maintained for TV view- 
ing. 


Procedures 


Sample. The sample consisted of a ninth- 
grade class of white students. Complete set sof 
test data were collected from twenty-nine of the 
class members. Four students from whom in- 
complete test data were gathered were discarded 
prior to the analysis of the data. 

Tests. The following tests were administered 
to the subjects under normal conditions: 

School Ability Test, Form 3A, ofthe Coopera- 
tive School and College Ability Test 

Iowa Test of Educational Development, Form 
x3s%, Tests 3-7 

Test 3. Correctness and Appropriateness of 

Expression 

Test 4. Ability to do Quantitative Thinking 

Test 5. Ability to Interpret Reading Materials 

in the Social Sciences 

Test 6. Ability to Interpret Reading Materials 

in the Natural Sciences 

Test 7. Ability toInterpret Literary Materials 

Thurstone Temperament Schedule3 

Gordon Personal Profile4 

Gordon Personal Inventory4 

SRA Primary Mental Abilities, ages 11-173 

Iowa Silent Reading Test, New Edition, Ad- 
vanced 

Clerical Speed and Accuracy of the Differential 
Aptitude Test5 





The School Ability Test was administered nor- 
mally and experimentally. All other tests men- 
tioned above were administered to provide data on 
the basis of which to evaluate the differences be- 
tween the results obtained from the normal and 
experimental administrations of the School Ability 
Test. 

The School Ability Test was used as the exper- 
imental test6. The test consistsof four parts. 
Parts I and III assess verbal abilities, and Parts 
II and IV assess quantitative abilities. Onthe ba- 
sis of item analysis data supplied by the Educa- 
tional Testing Service, the test was broken into 
two half-tests, thus permitting the use of a half- 
test for each set of experimental conditions and 
thereby eliminating item overlap between the two 
experimental administrations. Data dealing with 
the comparability of these half-tests appear in 
Table I. The column entitled ‘‘N’’ indicates the 
number of items in each part of the half-test. 

Experimental Conditions. There were simi- 
larities and differences between the experimental 
conditions under which the half-tests were admin- 
istered on two occasions. The similar conditions 
were as follows: no inter-exposure interval, all 
items were five-choice multiple choice, the total 
testing time devoted to item solution was the same, 
directions were the same, and the mode of item 
presentation was visual. 

The chief differences between the experimental 
administrations were the number of items pre- 
sented per exposure, the intra-exposure interval, 
and the order in which the items were presented. 

The single item administration presented one 
item per exposure. The exposure duration was 
determined by dividing the number of items in- 
cluded in each part of the test into the amount of 
time the test authors allotted for that part of the 
test. Since the number of items andtime allowed 
differed from part to part of the test, the expos- 
ure interval was not constant throughout. The 
following durations were used: PartI, 30 seconds; 
Part Il, 48 seconds; Part Il, 20 seconds; and 
Part IV, 60 seconds. The items within each part 
of the half-test were presented in the same order 
in which they appeared inthe parenttest, thus the 
item order was that of ascending difficulty. 

The triad administration presented three items 
per exposure. The exposure interval for each 
part was three times that allowed for the single 
item administration. Theitem order within each 
part was determined in this way. Triads were 
constructed to have the same mean difficulties 
and ranges, insofar as possible. Within a triad, 
items were arranged from least to most difficult. 
Since it was impossible to make the range of dif- 
ficulty for each triad exactly the same, the triads 
were arranged from greatest toleast range. In 
short, the order of items within a triad presented 
a rhythm of difficulty--simple, average, difficult 
--which was repeated throughout the entire test. 
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TABLE I 


COMPARISON OF HALF-TESTS DEVISED FROM THE SCHOOL ABILITY TEST 





Form A Form B 








Sigma p range Sigma 


p range 





. 24 .14-.91 , . 20 


. 24 .09-. 91 : . 23 


.21 .11-.82 , . 23 


. 23 . 06-. 80 . . 24 


.O7-.75 


. 09-. 84 


.14-.95 


. 08-. 80 





TABLE I 


MEANS ANC STANDARD DEVIATIONS ON HALF-TESTS FROM SAT, 





Mean 





17. 068 


16. 757 


. 619 


. 930 


. 930 


. 102 


. 378 


. 550 
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The reader will note that thirteen items are 
contained in each of Parts II and IV of the test. 
Actually fifteen items were used in each so that 
there would be five triads. Two were borrowed 
from the comparable part of the other form. 
However, these borrowed items were not scored. 

Testing Schedule. The tests were administer- 
ed in two sittings, the first preceding the second 
by a period of three months. The first sitting in- 
cluded the normal administration of the School A- 
bility Test and the five tests of the Iowa Tests of 
Educational Development. These tests were ad- 
ministered in half-day sessions on three consec- 
utive days. 

The remaining tests were administered over a 
three-day period as follows: First day, morning 
--Gordon Personal Profile and Gordon Personal 
Inventory; afternoon--single item experimental 
half-test. Second day, morning--SRA Primary 
Mental Abilities; afternoon--triad experim ental 
half-test. Third day, morning--Thurstone Tem- 
perament Schedule, DAT Clerical Speed and Ac- 





curacy, and Test I of the Iowa Silent Reading Test. 


Experimental Directions. Special directions 
and sample items were used when administering 
the experimental test. The directions were dis- 
tributed, one copy to each subject, and were read 
aloud by the examiner while the subjects read 
them silently. Prior to the presentation of each 
of the four parts of each experimental test,special 
directions were given and a sample item was ex- 
posed on the screen. Subjects were atno time in- 
formed of the amount of time they could spend on 
each exposure. They were informed that no ques- 
tions could be asked during the administration of 
any part of the test. They were informed in the 
instructions that they would not be able to refer 
back to items which they might nothave answered 
when they had been presented. 

No unusual circumstances arose during the ad- 
ministration of the experimental tests. 





Results 


Comparison of Means. TheSchool Ability Test 
that was administered under normal conditions 
was re-scored several times to gain scores over 
items which appeared in the half-tests. In ac- 
cordance with customary usage of the School A- 
bility Test, Parts I and II] were combined to pro- 
vide a verbal score and Parts II andIV were com- 
bined to provide a quantitative score. 

The means and standard deviations are pre- 
sented in Table II on the previous page. In this 
table, and hereafter, the following code is em- 
ployed to conserve space: C, control; X, experi- 
mental; V, verbal; Q, quantitative; S, single; and 
T, triad. 

Because all sets of data were gathered from 
the same subjects, it was necessary when com- 
puting t tests of mean differences to make correc- 








tions for correlated performaices. The relevant 
intercorrelations appear in Table III andthe t 
tests appear in Table IV. 

The following generalization might be tenta- 
tively drawn from the tabular data. The XT ad- 
ministrative conditions lead to higher scores than 
do the XS conditions. On the verbal section, ei- 
ther experimental condition produces higher mean 
scores than does the control administration. But 
apparently no generalization can be drawn for the 
quantitative items: CQS and CQT means bracket 
the counterpart X means despite the fact that the 
CQS and CQT means should be equal on account of 
the method used to build the half-tests. 

The tendencies described above must be re- 
garded skeptically becausethe t tests revealed 
no significant differences between the means ex- 
cept in one case. Furthermore, one might attrib- 
ute the apparent, although not significant, super- 
iority of XT to XS to the fact that XT data were 
gathered subsequent to the XS data thereby re- 
flecting practice effect. 

In summary, the data to this point do not gen- 
erally indicate, at the .05 level, -differences in 
mean scores which are functions of administrative 
conditions. However, the control data were col- 
lected three months prior to the collection of ex- 
perimental data. Nevertheless, the intercorrela- 
tions which appear in Table II] indicate that sub- 
jects would fall in roughly the same order on the 
experimental test as they were ordered by the 
control test. Thus, the data to this point would, 
in a limited way, lend support to the administra- 
tion of tests by television if a pacing technique is 
used. 

Comparisons by Item Types. The preceding 
analysis dealt with means onthe Q and V sections 
of the test inasmuch as Q and V scores are gen- 
erally reported when the School Abi lity Test is 
used. However, the Q and V scores, respective- 
ly, are established by summing success over two 
parts of the test each of which has relatively dis- 
tinct kinds of items. Sample items illustrate this 
difference. 





Verbal 
Part I We had worked hardall day so that by 
evening we were quite ( ). 
A. small B. tired C. old 
D. untrained E. intelligent 


Part III Chilly 
A. tired B. nice C. dry 
D. cold E. sunny 


Thus, the items in Part I containasentence as 
a stem, whereas in Part Ilthe item stem is 
merely a stimulus word. 


Quantitative 
Part I] 5413 A. 586 B. 596 C. 696 
-4827 D. 1586 E. None of these 





CURTIS - KROPP 


TABLE III 


CORRELATIONS BETWEEN SCORES ON CONTROL AND 
EXPERIMENTAL HALF-TESTS OF SAT, 3A 





CVS CVT XVS XVT CQs CQT XQS 





. 879 . 935 . 864 


. 869 . 865 


. 854 





TABLE IV 


**t’? TEST OF MEAN DIFFERENCES CORRECTED FOR. 
CORRELATED PERFORMANCES 





CVS CVT XVS XVT CQSs 





. 55 -2.03 -1.59 


-1.99 -2.07* 


- .25 
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Part IV Four $10-bills are equal to how many $5 
-bills? 
A. 20 B. 10 Cc. 8 
D. 40 E. 2 


Thus, the items in Part I consist of stems 
containing a set of numbers and an operations sign, 
whereas the items in Part IV consist of stems 
which are statements of word problems involving 
arithmetic. 

Analyses were done to determine whether dif- 
ferences existed between success 9n particular i- 
tem types in association with different conditions 
under which the items were administered. The 
relevant data appear in Tables V (means and stan- 
dard deviations), VI (intercorrelations) and VII 
(t’s) on the next pages. 

The tabular data reveal for the items in the two 
verbal parts of the test that regardless of whether 
the item contained a verbal statement or not the 
experimental administrative procedures led to 
slightly higher mean scores than did the normal 
administrative conditions. Whether this is due to 
the effect of the experimental conditions or the 
fact that the experimental data were collected 
three months after the control data is very much 
in doubt. On the quantitative items a differential 
in response accuracy is associated with method of 
test administration. On the quantitative items 
having statements for stems, the experimental 
administrations led to higher mean scores than 
did the control administration. However, on the 
quantitative items which had for a stem a series 
of numbers and an Operation sign, the control 
means were significantly different from and great- 
er than the experimental means. This finding is 
worthy of comment. 

It is felt that the reduced m ean scores on the 
quantitative no-stem items is attributable to any 
or all of the following, since the problems were 
so involved the examinee probably copied the data 
from the screen to his scratch paper in order to 
solve it, thus 1) reducing his working time due to 
the time involved in copying, 2) committing cler- 
ical errors when copying, 3) after solving the 
problem, needing to locate a similar response on 
the screen during which errors in remembering 
might have occurred. Only further research will 
provide definite knowledge of the reasons for the 
reduced scores. It should be noted at this time, 
though, that non-response tothese items was sig- 
nificantly less than the amount of non-response on 
the same items when administered normally. 
Whether the poorer performance is due simply to 
guessing or to factors noted above is problemati- 
cal. 

Another factor worthy of specific mention is 
that the intercorrelations of the quantitative items 
with regard to method of administration are of a 
lesser magnitude than comparable intercorr el a- 
tions for the verbal items. Generally the lowest 





intercorrelations are based on the no-statement 
quantitative items. 

Analysis of Omits. It was hypothesized that 
subjects would be more likely to give a greater 
number of responses to the items when experi- 
mentally presented than when normally presented 
simply because all items would be exposed to the 
examinees on the experimental administration 
whereas on the normal adm inistration their test 
taking habits might have caused them to omit 
some items. Also it was hypothesized that there 
would be no difference in response frequency be- 
tween the triad and single experimental presenta- 
tions. Both of these hypotheses held atthe .05 
level of confidence. The Wilcoxon Matched- Pairs 
Signed-Ranks Test was used incomputing the sig- 
nificance of differences. The pertinent data are 
presented in Table VIII. 

Thus it appears that the experimental test ad- 
ministration procedures, i.e., pacing examinees 
over all items, leads toa greater number of item 
responses than does the normal administrative 
procedure during which the examinees are re- 
sponsible for their own pacing. 

Although differences in non-response are as- 
sociated with mode of test administration, the 
data were analyzed to determine if non-response 
was related to certain personality variables on 
one kind of administration but not on the other. A 
response accuracy score was calculated for each 
examinee on the basis of his responses on the XQS 
and CQS administrations, separately. The ac- 
curacy score for a subject was equal tothe ratio 
of his ‘“‘rights’’ score tothe total number of items 
he responded to. The score was computed in this 
way to discount differences inability which would 
effect the subject’s probability of non-response. 
The personality variables which were evaluated 
with regard to response accuracy were ‘‘Cau- 
tiousness’’ from the Gordon Personal Inventory 
and ‘‘Reflective’’ and ‘‘Impulsive’’ from the Thur- 
stone Temperament Schedule. A 2 x 2~X_* anal- 
ysis produced the results shown below. 





CQS response accuracy 
vs Cautious “X.* = 00 p> .05 
vs Reflective “x2 = 2,28 > 05 
vs Impulsive “x. = . 86 > .05 


XQS response accuracy 
vs Cautious “x 2 = 65 .05 
vs Reflective -x 2 = .00 > .05 


vs Impulsive = .00 > .05 

The data indicate that tendency to omit items 
is not related to these three personality variables 
on either the normal or experimental test admin- 
istrations. 

Intercorrelations. The intercorrelations be- 
tween control and experimental administrations 
of the experimental tests appear in Table IX. The 
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TABLE V 


MEANS AND STANDARD DEVIATIONS OF PART SCORES ON 
HALF-TESTS FROM SCHOOL ABILITY TEST 





7 ____—~Verbal Quantitative 
Condition é S*(1) NS**(II) Condition SIV) 


sigma x sigma Xx 











NS(II) 
sigma xX sigma 








9 


2. § -QS . : 9.14 
8.10 
. 48 


. 31 





**no stem 


TABLE VI 


CORRELATIONS BETWEEN PART SCORES ON HALF-TESTS 
FROM SCHOOL ABILITY TEST 





Verbal Quantitative 





Variable Pair Sf) NS Variable Pair S(Iv) NSU) 


I r 





CVS-XVS ) 


CVT-XVT . 76 Wi CQT-XQT . of . 48 


XVS-XVT XQS-XQT a -41 





TABLE Vi 


‘*t’’TESTS OF DIFFERENCES BETWEEN CORRELATEL MEANS OF 
PART SCORES FROM HALF-TESTS OF SCHOOL ABILITY TEST 











ice eal Verbal Quantitative 
Variable Pair S(D) NSUII) Variable Pair S(UIV) NS(I) 
t 








CQS-XQS 2. 29* 3. 28* 
CVT-XVT CQT-XQT . 03* 3. 00* 


XVS-XVT XQS-XQT 





*p <05 
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TABLE VI 


COMPARISON OF NON-RESPONSE ON EXPERIMENTAL 
AND CONTROL ADMINISTRATIONS 








Non- Response Hypothesis Significance Level 





XVT = XVS No difference 
XVT <CVT .025 
XVS <CVS .01 
XQS No difference 
CQT .005 
.005 





TABLE Ix 


INTERCORRELATIONS OF CONTROL AND EXPERIMENTAL 
ADMINISTRATIONS OF EXPERIMENTAL TEST 








CVT XVS XVT CQs 
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pertinent correlations are those which are based 
on scores over the same items which were ad- 
ministered under experimental and control condi- 
tions. For the verbal part of the test, CVS and 
XVS correlate at .93, andCVT and XVT correlate 
at . 86; thus, indicating rather high association 
between scores collected under control and exper- 
imental conditions of testadministration. Ifthese 
intercorrelations are regarded as equivalence 
coefficients, then it is proper to adjust them for 
a test twice as long inasmuch as they are based 
on half+ests. Such correction by the Spearman- 
Brown formula produces the following cor rela- 
tions: CVS and XVS, .96;and CVT and XVT, .92. 
For the quantitative part of the test, CQS and XQS 
correlate at .79, and CQT and XQT correlate at 
.70. Raising these foratest twice the length 
yields .88 and .82, respectively. The adjusted 
correlations for the quantitative section are rela- 
tively high but probably indicate an unsatisfactory 
level of reliability. The difference in magnitude 
between the correlations based on the verbal part 
of the test and the correlations basedon the quan- 
titative part of the test is probably due in part to 
the difficulty described earlier which the subjects 
encountered when attempting to solve quantitative 
items when presented visually. 


Factor Matrix 





All subjects were administered the experimen- 
tal test under normal and experimental conditions. 
In addition, all subjects were administered Tests 
3-7 of the Iowa Tests of Educational Development, 
the Thurstone Temperament Schedule, the Gordon 
Personal Profile, the Gordon Personal Inventory, 
the SRA Primary Mental Abilities, the Iowa Silent 
Reading Tests, and the DAT Clerical Speed and 
Accuracy Test. These additional tests were ad- 
ministered to provide data onthe basis of which to 
determine if the experimental administrative pro- 
cedures tended to produce differences between the 
factor content scores from the experimental test 
when administered under control and experimental 
conditions. The correlation matrix was factored 
by the centroid method and the factor matric after 
a quartimax rotation appears in Table X. No note- 
worthy differences between the factorial content 
of the experimental test administered under nor- 
mal and experimental conditions were noted. 

Observations During Experim ental Testing. 
Subject reactions to the experimental testing and 
experimenter observations are briefly summar- 
ized below. The subjects reportedno special dif- 
ficulty in dealing with the experimental adminis- 
tration of items from the verbal parts of the 
School Ability Test. However, they reported dif- 
ficulty in solving items from the Quantitative sec- 
tions. Subjects reported, and the experimenter 
observed, that after solcing a quantitative item, 
calculating on scratch paper, they would view the 
screen to locate the correct alternative only to 








find that a new item had taken the place of the 
item they had been solving. Subjects reported 
this to occur frequently and they claimed the ex- 
perience was frustrating. These reactions were 
particularly noticeable on the single-item pre- 
sentation, but less so on the triad presentation. 
Of the two experimental conditions, subjects in- 
dicated a preference for the triad presentation. 

Subjects appeared to be restless and inatten- 
tive when presented easy items. This is probab- 
ly due to the fixed amount of time allotted to each 
item regardless of difficulty. Since no audible 
signal was given to indicate the removal of anold 
item and the presentation of anew item on the 
screen, subjects who were not paying attention to 
the screen sometimes noticed the new item only 
after it had been on the screen for several sec- 
onds. 

There were other reactions but those above 
are the noteworthy ones. In subsequent studies, 
alterations of the experimental conditions design- 
ed to dispel adverse student reaction as describ- 
ed above will be tested. 


Summary 


In the main, the attempt to reproduce by vis- 
ual presentation the test results obtained by nor- 
mal presentation was successful. Certain diffi- 
culties encountered by the subjects in working 
from screen to paper to screen were identified. 
There did not appear to beimportant differences 
in performance between normal and experimental 
conditions which were associated with certain 
personality attributes. The factorial content re- 
mained substantially constant. 

Both the analysis of subject performance and 
the experimenters’ observations lead us to think 
that television testing will make possible agreat- 
er test coverage of an area within a given unit of 
time than is commonly attained by traditional 
methods. This is true both because variable ex- 
posure intervals which are functions of item dif- 
ficulty and type ‘will probably lead to greater ef- 
ficiency and because the method appears to carry 
the subject further through the total test. 

Much remains to be done. Studies that under- 
lie the determination of principles by which to 
maximize the efficiency of TV testing are under- 
way. Studies focused uponthe audio presentation 
of test items are likewise progressing. The crit- 
ical conditions which lead to altered factorial 
content are to be determined. There also re- 
mains the investigation of the results obtained 
under the mutually reinforced presentation by 
audio and visual means. Mechanical problems, 
such as adapting answer sheets, remain to be 
solved. 

The results to date encourage these experi- 
menters to continue the basic studies which are 
necessarily antecedent to the development of 
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principles of television testing. 
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NONNORMALITY AND PRODUCT MOMENT 
CORRELATION* 


RAYMOND C. NORRIS 
George Peabody College for Teachers 
HOWARD F. HJELM 
United States Office of Education 


ALTHOUGH THE Pearson product moment 
correlation coefficient has been used as an index 
of relationship between two variables for more 
than 50 years, there continues to be considerable 
controversy concerning the circumstances under 
which its use is appropriate. As recently as 1957 
and 1958 The American Psychologist carried 
comments by Nefzger and Drasgow (8), Furfey 
(5), LaForge (6), and Milholland (7) on the ques- 
tion of whether or not it is necessary to assume 
normality in the distributionson which the corre- 
lations are based. Binder, in a more recent ar- 
ticle in The American Psychologist (2), attempt- 
ed to show that much of the controversy has been 
misdirected because of a lack of understanding of 
the place of assumptions in the interpretation of 
the correlation coefficient. By developing three 
different mathematical models, Binder shows 
that different inferential interpretations of the 
correlation coefficient grow out of the different 
mathematical models and concludes, ‘‘The pref- 
erence for the bivariate normal model to a more 
general model stems from its great deductive 
power and its usefulness in many empirical situ- 
ds 

The extent to which failure to satisfy the as- 
sumption of bivariate nor mality distorts the in- 
terpretation of the product moment coefficient 
was the subject of numerous studies in the period 
1929 through 1932. Egon Pearson (9,10,11) con- 
ducted a series of studiesin which the score dis- 
tributions deviated from bivariate normal in var- 
ious ways and concludedthat the sampling distri- 
bution of the product moment coefficient was not 
seriously effected by failure to meet this theoret- 
icalassumption. His findings were generally 
confirmed by other empirical investigations by 
Dunlap (4), Chesire, Oldis, and Pearson (3) and 
Rider (12). Baker (1), dealing with a bivariate 
distribution with markedly skewed marginal fre- 
quencies, concluded that for samples of size forty 
the obtained sampling distribution was also skewed 
and that usual tests of significance would be mis- 
leading. 

Marginal distributions in these studies deviat- 
ed from the normal in both skewness and kurtosis. 
Population correlations ranged from zero to . 83, 











sample sizes from 5 to 52, and sampling distri- 
butions from 50 coefficients to 1770. In most of 
the work sampling distributions were based on 500 
or fewer correlations, and Baker’s study (the only 
one which raised serious question about the dis- 
tortion of the obtained sampling distribution) was 
based on only 50 samples of size 40. 

The current study was undertaken in an attempt 
to pin down any empirical effects of nonnormality 
by taking larger numbers of samples than had 
been used previously and determining the obtained 
sampling distribution in each case. Sample sizes 
used were in the range of ones frequently en- 
countered in educational and psychological re- 
search. Several types of nonnormality frequently 
encountered in such research were used, and the 
population correlations of zero and . 83 represent- 
ed the range of values usually encountered. 


Procedure 


Ten populations of size ten thousand were estab- 
lished. A population having approximately no cor- 
relation and a population having substantial corre- 
lation were established for eachof the following bi- 
variate forms: normal, rectangular, leptokurtic, 
slightly skewed, and markedly skewed. The ten 
populations were punched onto one set of ten thou- 
sand IBM cards. The tenthousand cards were 
placed in random order by ordering four columns 
of random digits. The samples were obtained by 
repeatedly counting off the desired number of 
cards. The deck was sampled without replace- 
ment and to exhaustion. The deck was reordered 
by using another set of four columns of random 
digits and samples counted off. This was repeat- 
ed twenty-four times inorder to obtain the desired 
number of samples. This necessitated having five 
decks of ten thousand cards each with the sixty 
columns of data being identical for each deck 
leaving a totalofone hundred columns for random 
digits. Sample sizes 15, 30, and 90 were ob- 
tained concurrently with the first two samples of 
size 15 comprising the first sample of size 30, and 
the first three samples of size 30 comprising the 
first sample of size 90. Correlation coefficients 
were calculated for the samples and sampling dis- 


*The research reported herein was performed pursuant to a contract withthe Office of Education, United 
States Department of Health, Education and Welfare. 
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tributions for sample sizes 15, 30, and 90 were 
established for each of the ten populations. As 
indicated in Table I, the sampling distributions 
consisted of 15, 984 coefficients for sample size 
15; 7,992 coefficients for sample size 30, and 
2, 664 coefficients for sample size 90. The sam- 
pling and calculating were done by means of the 
IBM 650 Data Processing System. 

The obtained sampling distributions were com- 
pared with the appropriate theoretical sampling 
distributions assuming bivariate normality and 
with the obtained sampling distributions from the 
bivariate normal populations. Fisher’s transfor- 
mations, 


z = 5loge 


were used in establishing the theoretical sampling 
distributions. The obtained and theoretical sam- 
pling distributions were tested for overall good- 
ness of fit by means of Kolmogorov-Smirnovtests. 
Critical values were established for r_ 9095, Tr. 025, 
r.975, and r ggs5 and frequency tabulations made 
of the number of significant coefficients in each 
of the obtained sampling distributions. 


Results 


The obtained sampling distributions from the 
normal (control) population having no correlation, 
approximated its theoretical distribution very 
closely. The Kolmogorov-Smirnov goodness-of- 
fit test showed no departure from the theoretical 
distribution for samples of sizes 30 and 90, and 
the discrepancy between obtained and theoretical 
distributions for sample size 15 was significant 
only at the . 20 level of significance. The close- 
ness of the fit even in this case was remarkable. 
Inspection of Table II and Graph 1 will also show 
that for the normal case the relative incidence of 
significant correlations conformed very closely to 
the theoretical proportions. 

The close correspondence of the obtained and 
theoretical sampling distributions where the pop- 
ulation correlation was approximately zero and 
the score distribution approximately normal was 
interpreted as providing assurance of the adequacy 
of the sampling procedures used in this study. In 
particular, the effects of using discrete, finite 
populations in place of the theoretical continuous 
and infinite populations did not appear to materi- 
ally influence the results. Nor did sampling pro- 
cedures, as such, appear to cause deviation of 
the empirically determined distributions from the 
theoretical expectations. 

The empirical sampling distributions based on 
a rectangular population conformed very closely 
to their theoretical distributions. However, the 
Kolmogorov-Smirnov test indicated a significant 
departure between the two for sample size 15. 





Although the probability of such a discrepancy 
occurring by chance alone was less than .05, the 
magnitude ofthe discrepancy was small. Inspec- 
tion of Table II and Graph 2 will also show that 
the incidence of significant cor relations did not 
deviate markedly from the theoretical propor- 
tions at either the .05 or .01 levels of signifi- 
cance. All considered, there was little evidence 
that sampling from a rectangular distribution 
produced sampling distributions markedly differ- 
ent from those obtained when the assumption of 
bivariate normality was met. 

The obtained sampling distributions based on 
a leptokurtic population deviated from their theo- 
retical distributions in the same fashion as did 
those based on the normal population, but the 
magnitued of the discrepancies was much greater. 
The differences were significant at the . 05 level 
for samples of size 15 and 90 (see Graph 13). For 
samples of size 90 significant correlations were 
obtained only . 69 as often as expected at the . 05 
level of significance and . 60 as often as expected 
at the .01 level. Although there was some evi- 
dence that the sampling distributions of correla- 
tion coefficients based on a leptokurtic population 
tended to be leptokurtic also, the departure from 
the theoretical distribution was not very great. 

The obtained sampling distributions based on 
slightly skewed or markedly skewed populations 
appeared to bear very similar relationships to 
their theoretical distributions. Kolmogorov- 
Smirnov tests on the six com parisons indicated 
no significant departure from the theoretical dis- 
tributions at the .20 level of significance exc ept 
for samples of size 15 based on the markedly 
skewed population. The departure in that case 
was one which could occur between ten and fifteen 
percent of the time by chance alone. Reference 
to Table II and Graph 5 shows that even here the 
relative incidence of significant coefficients did 
not de viate markedly from theoretical expecta- 
tions. Only in the fact that there was a tendency 
to have too few significant negative correlations 
and too many significant positive correlations was 
there any suggestion that sampling distributions 
based on skewed populations would be skewed 
also. 

In general, it was concluded that for popula- 
tions in which there was no correlation the sam- 
pling distributions based on nonnormal popula- 
tions did not differ markedly from those where 
the theoretical assumptions of normality could be 
met. For samples as small as 15 or as large as 
90, the critical regions contained approximately 
the correct proportion of extreme coefficients at 
either the .05 or .01 levels of significance. Use 
of the theoretical distribution in place of some 
more exact sampling distribution would not ser- 
iously influence the probability of making errors 
of the first kind. 

Sampling distributions based on samples of 
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TABLE I 


SIZE AND NUMBER OF SAMPLES FOR EACH EMPIRICAL 
SAMPLING DISTRIBUTION 





Marginal Distribution Population 
Correlation Number of Samples 
Small Large N i= 30 N= 90 








Rectangular 
Lept Kurtic 
Slightly Skewed 


Markedly Skewed 





TABLE If 


PROPORTION OF OBTAINED COEFFICIENTS IN CRITICAL REGION (p = . 00) 











Population N l .O1 Level of Signif. 
Correlation Lower Upper Both 
Tail Tail Tails 








Normal .016 . 
Rectangular .019 . 1248 ; ] ON . 006 .0113 
Leptokurtic .004 )244 .0106 
Slightly Skewed .010 LS .0220 » Gane .0476 0052 , .O11L5 
Markedly Skewed .012 ] .0208 .0299 . .0036 . j .0114 
Normal .016 .0218 .0255 .047 .0053 .0054 .0106 
Rectangular .019 .0226 .0 .0454 .0058 .0048 .0105 
Leptokurti« . 004 .0219 ° .0442 .0051 .0043 .0094 
Slightly Skewed .010 .0224 ; Yi .0498 . 0046 0068 .0114 
Markedly Skewed .012 .0213 .0303 .0516 .0031 .0071 .0103 


Normal .016 .0259 .0229 .0488 .0060 .0056 .0116 
Rectangular .019 .0252 .022 .0477 .0034 .0034 - 0068 
Leptokurtic . 004 0188 0158 0345 . 0030 . 0030 . 0060 
Slightly Skewed .O1LO 0 .0199 .0267 .0465 .0041 .0071 .0113 
Markedly Skewed .012 ‘ .0236 .0248 . 0484 .0038 . 0068 .0105 
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Based on Normal Population (p+=+.016,N+15) 
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size 15 and 30 fromabivariate normal population 
having substantial correlation (p = .83) deviated 
from their theoretical distributions atthe . 01 lev- 
el of significance. For sample size 90 the Kol- 
morgorov-Smirnov test did not yield a significant 
difference at the . 20 level. However, the devia- 
tion of the obtained and theoretical distribution in 
no case was extremely pronounced. Inspection of 
Table III shows that for both tails combined the 
incidence of significant coefficients approximat- 
ed the theoretical number at the . 05 and . 01 lev- 
els of significance. However, there was a general 
tendency for there to be proportionately too many 
significant results in the positive tail of the dis- 
tribution and proportionately too few in the nega- 
tive tail. This disparity decreased as sample 
size increased. 

The deviation of the obtained sampling distri- 
butions from the theoretical ones based on the as- 
sumption of normality ofdistributionfor Fisher’s 
statistic cast doubt onthe validity of that assump- 
tion for values of rho as great as . 835. For the 
purpose of determiningthe effect of nonnormality 
of population distributions on the sampling distri- 
butions of the product moment coefficient, the ob- 
tained versus theoretical comparisons seemed 
somewhat inappropriate. Consequently, the ob- 
tained sampling distributions based on nonnormal 
populations were compared also with the obtained 
sampling distributions based on the normal popu- 
lation. 

The empirical sampling distributions for all 
sample sizes from the rectangular population 
having substantial correlation deviated from their 
theoretical distributions at the .01 level of sig- 
nificance. Thisdeparture was in the same direc- 
tion as inthe normal control population, but the 
deviation was much more pronounced. Table II 
shows that the critical regions of the obtained 
distributions in no case contained more than . 477 
of the number of significant coefficients obtained 
from the normal population. The disproportion- 
ately low incidence of significant coefficients be- 
came more extreme for increased sample size 
and for the lower level of significance. In one 
tail of the sampling distribution based on samples 
of size 90, there were no coefficients found as low 
as the theoretical value of r, 905. 

All available evidence indicated that the sam- 
pling distributions of product moment coefficients 
based on arectangular population were much 
more leptokurtic than either their theoretical dis- 
tributions or the obtained distributions based on 
a normal population. At the . 05 level of signifi- 
icance for both tails combined, the empirical 
distributions contained between . 381 and . 214 of 
the number of significant correlations found in the 
empirical distributions based on the normal pop- 
ulation. At the .01 level of significance the cor- 
responding proportions were .254 and .079. Rec- 
tangularity appeared to have a pronounced effect 





on the sampling distributions of the product mo- 
ment correlation coefficient. 

There was strong evidence that the sampling 
distributions based on a leptokurtic population 
containing substantial correlation were substan- 
tially more platykurtic thaneither their theoreti- 
cal distributions or the obtained sampling distri- 
butions based on a normal population. Table II 
shows the proportions of significant coefficients 
in all critical regions were higher than anticipat- 
ed from the theoretical distributions and from two 
to more than seven times as great as those ob- 
tained from the normal population. 

All sampling distributions based on skewed 
populations having substantial correlation deviat- 
ed from their theoretical distributions at the . 01 
level of significance. Table II indicates that all 
critical regions at the .05 and . 01 levels of sig- 
nificance contained a greater proportion of signif- 
icant correlations than expected from the theoret- 
ical distributions and a still greater proportion 
than found in the empirical distributions based on 
the normal populations. 

Several general observations seemed warrant- 
ed on the basis of the datafrom this study. They 
are: 


1. When there was essentially no correlation 
in the population, the shape of the sampling dis- 
tributions for the product moment correlation co- 
efficients did not vary markedly as a function of 
nature or extent of nonnormality in the bivariate 
distribution. In general, these obtained sampling 
distributions conformed very closely to their the- 
oretical distributions. 

2. When there was substantial correlation in 
the population, the shape of the sampling distri- 
bution based ona bivariate normal population de- 
parted markedly from its theoretical distribution. 
Fisher’s z transformation was appar ently inef- 
fective in providing a normally distributed statis- 
tic for values of rho as great as . 835. 

3. When there was substantial correlation in 
the population, the sampling distributions based 
on nonnormal populations deviated from their the- 
oretical distributions and from the distribution 
empirically deter mined for a bivariate normal 
population. The nature and direction of the devi- 
ation was specific to the type of nonnormality in 
the population, some sampling distributions ap- 
pearing leptokurtic and others being platykurtic. 

4. Discrepancies between proportions of the- 
oretical and obtained significant correlations were 
relatively greater at the . 01 level of significance 
than at the .05 level. Some of the obtained sam- 
pling distributions containedtoo many correlation 
coefficients in the extreme tails and too few coef- 
ficients in the less extreme sections of the tails. 
In other sam pling distributions the reverse was 
true. Due to the inconsistency of the effects of 
nonnormality on the tails of the sampling distri- 
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bution, it did not appear possible to adjust for 
nonnormality by modifying the level of signifi- 
cance in either direction. 

5. When there was substantial correlation in 
the populations, increased sample size appeared 
to accentuate the discrepancy between the ob- 
tained sampling distributions and either their the- 
oretical distributions or the empirical distribu- 
tion on a bivariate normal population. 

6. Sampling distributions based on skewed pop- 
ulations yielded disproportionate numbers of sig- 
nificant correlations in the two tails of the dis- 
tribution whether there was correlation in the 
populationsor not. Where rho was approximate- 
ly zero, the two-tailedtests yielded approximate- 
ly the expected proportion of extreme coefficients, 
but one-tailed tests based on the same distribu- 
tion did not. 
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PUPIL DISCOVERY VS. DIRECT 
INSTRUCTION * 


WILLIS E. RAY 
Ohio State University, Columbus 


EMPIRICAL EVIDENCE does not support any 
one consistent set of hypotheses with regard to 
the relative efficacy of pupil discovery versus 
teacher dominated directinstruction(1, 2, 6, 14). 
This is particularly true as these teaching meth- 
ods relate or interact with levels of intellectual 
ability of pupils (7, 8, 10). 

It was the purpose of the present investigation 
(9) to provide additional experimental and applied 
research evidence as to the relative effect of di- 
rected discovery, in situations providing numer- 
ous problem solving opportunities, upon initial 
learning, retention, and transfer of mic rometer 
measurement principles and skills as compared 
with traditional direct and detailed instruction in 
these situations, with three levels of intelligence. 


The Problem 


Many teachers instruct their pupils by the use 
of direct and detailed instruction. This method 
is sometimes characterized as the shortest road 
to the specific goal of efficient learning (3). By 
this method, the subject material being considered 
is presented to the pupils in great detail. The 
teacher poses problems and proceeds directly to 
the task of solving these problems without allow- 
ing much opportunity for the pupilto discover 
methods of solution or principles involved in their 
solution. The learner is actively or passively 
listening to and presumably assimilating princi- 
ples and procedures for the correct solution of the 
problems presented. In many cases the pupils 
actually memorize the principles and procedures 
without fully understanding the relations among 
various items or steps which are presented one 
after the other by the teacher. 

Two questions might be asked conc erning di- 
rect and detailed instruction: 1) to what extent 
does the pupil retain ideas, facts, and principles 
learned in this way? and 2) how widely does the 








pupil apply this newly obtained material to new 
and related situations? 

Studies in retention of learning reveal that ac- 
tive involvement of the learner in the learning 
situation retards forgetting (6, 11, 12). Research 
in transfer of learning has indicated that specific 
‘frules’’ dealing with quantitative problems at 
least temporarily blind the learner so that he fails 
to adapt a learned approach to solve effectively 
novel problems requiring somewhat different 
techniques for solution (5, 13). 

A method of teaching which wouldallow the pu- 
pil to become ‘‘active’’ in the learning situation 
may possibly have positive effects with regard to 
retention. If the pupil is allowed to discover re- 
lationships and methods of sol ution for himself, 
make his own generalizations and draw conclu- 
sions from them, he may then be better prepared 
to make wide applications of the material learned. 

It is entirely possible that adirected discovery 
method of teaching may be most effective with 
pupils of high mental ability. Likewise, itis pos- 
sible that a direct and detailed method of instruc- 
tion may be most effective with pupils of low men- 
tal ability. The use of direct instruction, repeti- 
tion, and drill have been common techniques used 
with the slow learner. 

From these postulations the following hypothe- 
ses were generated and tested: 

1. There is no difference in initial learning 
(knowledge of specific facts and principles, abil- 
ity to solve problems, and actual manipulative 
performance) between groups taught by Method A 
and Method B. 

2. There is no difference in retention of ma- 
terial initially learned as measured one and six 
weeks after instruction between groups taught by 
Method A and Method B. 

3. There is no difference inthe ability to 
transfer effectively the knowledge acquired as 
measured one and six weeks after instruction be- 


* This report is based upon a dissertation submitted in partial fulfillment of the requirements of the Ed. 


D. at The University of Illinois. 
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tween groups taught by Method A and Method B. 
4. There is no interaction between the two 
teaching methods employed and the low, average, 
and high intellectual levels with regard to initial 
learning, retention, and transfer of learning. 


Procedure 


Experimental Design. A treatments x levels 
design was employed in the experiment. Two ex- 
perimental teaching methods (A and B) and one 
control (C) (no instruction) constituted the three 
treatments, and the three levels (high, average, 
and low) were determined with reference to men- 
tal ability (Pintner IQ) assumed to be related to 
the criterion. The levels were introduced to in- 
crease the precision of the design. 

The independent variables were 1) methods of 
instruction and 2) intelligence. The dependent 
variables were 1) initial learning as measured 
immediately following treatment, 2) retention and 
transfer as measured one week after treatment, 
and 3) retention and transfer as measured six 
weeks after treatment. The controlled variables 
included 1) content and method of presentation un- 
der Method A and likewise under Method B, 2) 
length of instruction, 3) illustrations, 4) precision 
measuring equipment, and 5) testing and testing 
conditions. 

The Sample. The 135 Ssinitially selected for 
use in this experiment were Sam pled at random 
from all ninth grade boys enrolled in the three 
junior high schools of a medium sized (35,000 
population) midwestern community. All 158 a- 
vailable Ss were assigned to the three levels by a 
‘*counting off’’ procedure suggested by Lindquist 
(4). By independent random sampling, 18Ss were 
assigned to each of the experimental treatments 
within each of the three levels, and 9 Ss were as- 
signed to the control group within each of the three 
levels. Fifteen Ss were lost due to absence dur- 
ing the six weeks of the experiment, and 3 Ss 
were eliminated at random to bring the number 
Ss in each experimental cell to 15. No Ss were 
lost in the control group. Therefore 117 Ss were 
considered in the descriptive analysis of attri- 
butes and in the analysis of the criterion data. 

Due to the effects of random sampling, nosta- 
tistically significant differences were found to 
exist within comparable cells with reference to 
intelligence, standardized achievement, and age. 
The Ss were quite representative of the general 
population of ninth grade pupils with regard to so- 
cio-economic status. 

Learning Task. The zero to one inch micro- 
meter caliper, a precision measuring instrument, 
was selected to be used in the te ac hing-learning 
process of the experiment. Ss were taught the 
names and functions of the parts, facts about the 
instrument, micrometer (readings to .001 inch) 
and vernier (readings to .0001 inch) principles 











involved in reading the tool, and how to manipu- 
late and read the instrument for actual measure- 
ment. 

Instruction. Ss weretaught in groups of nine, 
each group being composed of three Ss from each 
of the three ability levels. The teaching-learning 
session was 47 minuteslong. Common introduc- 
tory material was presented to bothtreatment 
groups for the first 7minutes, and the remaining 
40 minutes were devoted to differential treatment. 
To assure constancy of conditions from one in- 
structional group to another, tape recordings 
were used of all oral instruction and 35 mm slides 
were used of all illustrations (same set of 22 
slides used with both groups). 

The control group was given no instruction. Ss 
in this group merely took the five criterion tests 
to determine how they would perform without in- 
struction. 

Method A, as developed, represented a tradi- 
tional approach to teaching skills and understand- 
ings, that of ‘‘direct and detailed’’ instruction, 
spoon feeding, or the ‘‘tell and do’’ method. By 
this method, the teacher pres ented the learning 
material, reviewed important points, and solved 
several examples related to the material without 
any pause other than normal sentence break 
throughout the forty minute session. 

Method B, on the other hand, could be char- 
acterized as a method of ‘‘directed discovery”’. 
By this method the pupil was called upon to be ac- 
tive, to carefully study illustrative material on 
his own, and contemplate leading questions asked 
by the teacher. In Method B, 48 percent of the 
treatment time (19 minutes out of 40 minutes) was 
spent in silence which allowed the Ss to react to 
the leading questions, discover principles, and 
make generalizations. Very few positive state- 
ments of fact and none of principle or gene ral i- 
zation were given orally by this method. 

Criterion Measures. Threetesting periods 
were used to evaluate the effectiveness of instruc- 
tion. Immediately following treatment, the Ini- 
tial Learning criterion test was administered (50 
minutes). The Ss were given no indication that 
testing would occur one and six weeks after treat- 
ment. One week after instruction, the Retention 
One Week and Transfer One Week criterion tests 
were given (100 minutes). At a period six weeks 
after instruction the Retention Six Weeks and 
Transfer Six Weeks criterion tests were admin- 
istered (100 minutes). 

All tests were work limit or power tests and 
involved both performance and paper and pencil 
items. The initial learning and retention tests 
involved actual measurement of guage blocks with 
the micrometer caliper (performance) and paper 
and pencil items designed to test knowledge of 
facts and principles relatedto the use of the pre- 
cision measuring instrument. The transfer tests 
involved both performance and paper and pencil 
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items that attempted to measure application of 
knowledge gained to new but related situations. 
Performance with and knowledge of the depth mi- 
crometer caliper, the vernier caliper, and the 
vernier protractor (basic principles of use the 
same as with the vernier micrometer caliper) 
were measured by these transfer tests. 

Subtests and part scores resulting from the 
tests were carefully analyzed by correlational 
techniques. As a result of this analysis, total 
test scores were determined by the sum of the raw 
scores of subtests. 

Reliability (equivalence) coefficients ranging 
from .92 to .97 were obtained for the five criter- 
ion tests by the Kuder-Richardson Case IV (For- 
mula 21) estimate. 

Statistical Treatment. Data were punched on 
IBM cards and various electronic machines in- 
cluding the digital computer were employed in an- 
alyzing the data. The analysis of variance tech- 
nique was used together with appropriate supple- 
mentary tests of homogeneity of variance. Cor- 
relational techniques and other methods employed 
were selected because of their adequacy in rela- 
tion to the data. The .05 level of confidence and 
beyond was considered significant for the purposes 
of this study. 





Findings 


The control group was included inthe design of 
the experiment to determine 1) if the ninth grade 
pupils used as Ss had prior knowledge of the 
learning task, and 2) if the Ss would obtain higher 
scores on tests subsequent to initial testing be- 
cause of learning from the tests, vicarious in- 
instruction or communication among Ss between 
testings, or personal study. The performance of 
Ss in the control group as shown in Figures 1 and 
2 indicates that none of the above factors was ap- 
parently operating in the experiment. Therefore, 
only the 90 experimental Ss were considered dur- 
ing the statistical analysis of criterion data. 

Test of Hypothesis I. Hypothesis I--There is 
no difference in initial learning (knowledge of spe- 
cific facts and principles, ability to sol ve prob- 
lems, and actual manipulative performance) be- 
tween groups receiving Method A and Method B. 

An analysis of variance was performed on the 
obtained data, and this analysis is presented in 
Table I. 

The analysis shows that the main treatments 
effect and interaction (T < L) account fora statis- 
tically insignificant amount of the variance asso- 
ciated with the learning scores of the groups. The 
F-test indicates that differences that are present 
between groups can be attributed to ability level. 
That is, significant differences inlearning scores 
exist between levels only. The outcome with re- 
gard to levels was an anticipated one since the ex- 
periment was designed todevelop real differences 
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between levels. Resting on the data obtained and 
the statistical technique employed to analyze these 
data, the null hypothesis regarding initial learning 
is tenable. 

Test of Hypothesis Il. Duringthe early design 
of this experiment, the investigator hypothesized 
that Ss receiving Method A would learn much 
more initially than those Ss receiving Method B. 
Of concern also, during the design stages of the 
experiment, was the fact that although Ss receiv- 
ing Method A may learn more initially, they may 
not retain as much of what they initially learned 
as that retained by Ss receiving Method B. With 
this in mind, Hypothesis II was formulated as fol- 
lows: 

Hypothesis Il--There is nodifference in reten- 
tion of material initially learnedas measured one 
and six weeks after instruction between groups 
receiving Method A and Method B. 

The clause ‘‘there is nodifference in retention 
of material initially learned’’ or what they did in 
fact learn, was designed so that anexamination 
would be made of the difference between the ini- 
tial scores and scores made at one andsix weeks. 
This technique of analysis would reveal whether 
Ss receiving Method A or Method Bretaineda 
greater proportion of what they initially learned. 

To test this hypothesis, analyses of variance 
were performed using difference scores and the 
results are revealed in Tables II, II, and IV. 

It can be seen from a study of Table Il that 
there was no significant difference in difference 
scores between treatments one week after in- 
struction. Since the available data offer insignif- 
icant evidence against the null hypothesis (Al1A = 
All,), this null hypothesis is thus tenable. 

However, during the ensuing five weeks, the 
group taught by the direct and detailed method 
failed to retain as much of what they didlearn 
when compared with the retention of material 
learned by Ss instructed by the directed discovery 
method. 

The analysis that supports this statement fol- 
lows. Since the F-value for treatments obtained 
from Table III is significant, the avail able data 
offer evidence against the hypothesis of no differ- 
ence (Al6, = Al6p). Bartlett’s test of homogen- 
eity of variance produced a chi-square of 1.327 
(table value at .05 = 3.841). This test gives sup- 
port to the belief that the variances of the two 
treatments are homogeneous. Since the F-test 
applied to the same data resulted in the rejection 
of the null hypothesis, and since Bartlett’s test 
indicated that it is not the variances that differ 
significantly, it appears that the significant value 
of F is the result of a difference in the means of 


the difference scores (Dj4). 
Considering the difference scores obtained by 


taking the difference of the scores attained one 
week after instruction andthe scores six weeks 
after instruction, the evidence indicates that at 
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TABLE I 


ANALYSIS OF VARIANCE OF LEARNING SCORES ON INITIAL LEARNING TEST 





Source Sum of Squares Mean Square 





Treatments 313.7 313.7 1. 041 
Levels 15, 624.8 7,812.4 25. 920* 
Treatments X Levels 596. 7 298.4 
Within Sets 25, 318.8 301.4 


Total 41,854.0 89 





* Significant F at .05 = 3.11 Significant F at .01 = 4.87 


TABLE I 


ANALYSIS OF VARIANCE OF DIFFERENCE SCORES BETWEEN INITIAL 
LEARNING AND RETENTION ONE WEEK 





Source Sum of Squares Mean Square 





Treatments 48.4 48.4 
Levels 291.7 
Treatments X< Levels 365. 2 
Within Sets 7, 584.3 


Total 8, 289.6 





six weeks, the Ss taught by the directed discovery The difference scores obtained from the initial 
method (Method B), retained a relatively larger Learning and Retention Six Weeks data were an- 
alyzed in Table IV. The F-value of 9.333 for 
treatments provides significant evidence for re- 
mand one week after instruction when compared jection of the null hypothesis of Al 6 A= Al 65 - 
with Ss that received Method A. However, Bartlett’s test of homogeneity of 


proportion of material that they had at their com- 





TABLE Il 


ANALYSIS OF VARIANCE OF DIFFERENCE SCORES BE TWEEN 
RETENTION ONE WEEK AND RETENTION SIX WEEKS 





Source Sum of Squares Mean Square 





Treatments 864.9 864.9 
Levels 30. 4 15.2 
Treatments X< Levels 168.3 2 
Within Sets 8, 350.5 84 99.4 


Total 9,414.1 89 





* Significant F at .05 = 3.96 Significant F at .01 = 6.95 


TABLE IV 


ANALYSIS OF VARIANCE OF DIFFERENCE SCORES BETWEEN 
INITIAL LEARNING AND RETENTION SIX WEEKS 








Sum of Squares df Mean Square 





Treatments 1,322.5 1,322.5 
Levels 479.4 239.7 
Treatments x Levels 220. 3 110.2 
Within Sets 11, 904.8 84 141.7 


Total 13, 927.0 89 





* Significant F at .05 = 3. 96 Significant F at .01 = 6.95 
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variance provided a significant chi-square of 
5.258 (table value at .05=3.84l and at .01 = 
6.635). Nevertheless, an F as large as 9.333 ob- 
tained from Table IV may bethe result not only of 
heterogeneity of variance, but also the result of a 
significant difference in means. 

A t-test can be performed eventhough the var- 
iances are found to be heterogeneous. The stand- 
ard error of the difference in means is calculated 
by using one-half the degrees of freedom available 
(n-1 rather than na+ np - 2). At-value of 3.054 
was obtained using this procedure (table value at 
.05 = 2.015 and at .01 = 2.691). Therefore, the 
null hypothesis of Al6,4 = Al6pg is not tenable. 

Hence the All, = AI1B portion of Hypothesis II 
is tenable. The AIl6,4 = Al6p portion of the re- 
search hypothesis must be rejected. 

Test of Hypothesis II]. Hypothesis III--There 
is no difference in the ability to transfer effec- 
tively the knowledge acquired as measured one 
and six weeks after instruction between groups 
receiving Method A and Method B. 

Data were collected by means of the criterion 
transfer tests given one week and six weeks after 
instruction. The results of the analyses of var- 
iance are presented in Tables V and VI. 

In both Tables V and VI the interaction compo- 
nent of the analysis accounts for an insignificant 
portion of the variance betweengroups. The var- 
iance associated with levels in both analyses is a 
large part of the total variance. 

A significant difference in main treatments 
effect is present both at one and six weeks after 
instruction. At one week the F-value of 5.158 is 
significant beyond the .05 point. Bartlett’s test 
of homogeneity of variance produced an insignifi- 
cant chi-square of 1. 003 (table value of chi-square 
at .05 = 3.841). Therefore, the variation between 
the variances of the two treatment groups is well 
within the limits of random sampling from a pop- 
ulation with a common variance. Hence the null 
hypothesis of o% = o2, is tenable. It can be con- 
cluded from this that the difference is inthe 
treatment means. The significant F provides 
support for the inference that Method B is super- 
ior to Method A with regard to a relatively wide 
application of learning as measured one week af- 
ter instruction. 

Six weeks after instruction the analysis of var- 
iance supplies an F of 10.885 which is significant 
beyond the .05 point. A chi-square of .678 ob- 
tained from Bartlett’s test of homogeneity of var- 
iance is not significant (table value at .05 = 3. 841) 
Therefore, since the null hypothesis of of = ofis 
tenable, the basis for the significant valueof F 
must be-the difference between the means of the 
treatment groups. 

Based on the data presented and the statistical 
procedures used in analyzing these data, Hypo- 
thesis III must be rejected. The alternative hypo- 
thesis is that the mean scores of the treatment 








groups are not equal must be accepted. This 
leads to the conclusion that Method B is superior 
to Method A in producing transfer under the con- 
ditions of this experiment. 

Test of Hypothesis IV. Hypothesis IV-- There 
is no interaction between the two teaching methods 
employed and the low, average, and high intellec- 
tual levels with regard toinitial learning, reten- 
tion, and transfer of learning. 

Data pertaining to this research hypothesis 
have been presented incidentally while testing Hy- 
potheses I through III]. In every analysis of var- 
iance performed, the treatments X levels or inter- 
action component of the variance between groups 
has been insignificant. Therefore, in light of the 
data collected under the conditions of this experi- 
ment, the null hypothesis of zero interactionis 
tenable. 





Discussion 


It was thought prior to the experiment that the 
directed discovery method would be more effec- 
tive with the brighter pupils and the direct and de- 
tailed method more effective with the less able 
pupils. The complete absence in this experiment 
of significant interaction between teaching method 
and mental ability was an unexpected result. 

Of considerable interest is the shape of the 
curves depicting the retention of pupils taught by 
the directed discovery method. Even at a period 
six weeks after instruction, these curves do not 
indicate any appreciable loss of material learned. 

The following finding has implications for the 
design of future research of this type. Had not 
the difference scores method been employed, sig- 
nificant differences in retention would not have 
been discovered. That is, by simply analyzing 
retention data at one week andagain at six weeks, 
no difference in retention by treatment group was 
found. However, by comparing the difference be- 
tween scores made by individuals initially and at 
one and six weeks, significant differences were 
ascertained. 

It would appear that the directed discovery ap- 
proach to teaching and learning can be used with 
the pupil of low mental ability with most satisfac- 
tory results. It has been held by many teachers 
that careful direct and detailed presentation with 
numerous examples, followed by repetition and 
drill, is an effective way of teaching the slow 
learner. 

Teaching methods com pared in this study are 
rarely used in pure form. Good teaching makes 
use of selected approaches geared to specific 
teaching situations. The results of this study are 
applicable to the extent that each method compar- 
ed can be applied in practice in pure form. 

These findings support some prior studies and 
are in conflict with results of other experiments 
previously noted. It wouldseem tothe writer 





TABLE V 


ANALYSIS OF VARIANCE OF LEARNING SCORES ON TRANSFER ONE WEEK TEST 





Source Sum of Squares df Mean Square 





Treatments 440.0 440.0 5. 158* 
Levels 3, 362.5 1, 681.3 19. 710** 
Treatments < Levels 86.1 43.1 

Within Sets 7,164.3 84 85.3 


Total 11, 052.9 89 





* Significant F at .05 = 3.96 Significant F at .01 
** Significant F at .05 = 3.11 Significant F at .01 


TABLE VI 


ANALYSIS OF VARIANCE OF LEARNING SCORES ON TRANSFER SIX WEEKS TEST 








Source Sum of Squares df Mean Square 





Treatments 877.3 877.3 10. 885* 
Levels 4,897.0 2,448.5 30. 378** 
Treatments X Levels -6 .3 

Within Sets 6, 768.0 84 80. 6 


Total 12, 542.9 89 





* Significant F at .05 = 3.96 Significant F at . 01 
** Significant F at .05 = 3.11 Significant F at .01 
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that replication of experiments must be encour- 
aged to build up a body of knowledge regarding 
theory of teaching and learning. In many instan- 
ces, particularly in graduate research in educa- 
tion, a taboo is placed upon the replication of pre- 
vious work. 


Summary and Conclusions 





In summary the investigator found: 

1. With regard to initial learning, that there 
was no statistically significant differencein a- 
chievement between those Ss receiving Method A 
or Method B. 

2. With reference to retention, at one week, 
of material initially learned, that there was no 
statistically significant difference in retention be- 
tween Ss receiving Method A or Method B. 

3. With respect to retention of material ini- 
tially learned as determined six weeks after in- 
struction, that Ss taught by the directed discovery 
method retained a statistically significant greater 
proportion of this learning when com pared to Ss 
instructed by the direct and detailed metaod, 

4. In regard to effective application or trans- 
fer of learning as determined one week after in- 
struction, that there was a statistically signifi- 
cant difference in transfer, between Ss receiving 
Method A and Method B, in favor of the directed 
discovery group. 

5. In relation to transfer six weeks after in- 
struction, that Ss taught by the directed discovery 
method were more able to apply, to astatistically 
significant degree, the learning acquired when 
compared to application made by Ss given the di- 
rect and detailed treatment. 

6. In attempting todiscover differential effec- 
tiveness of direct and detailed and directed dis- 
covery teaching methods withthe low, average, 
and high intellectual levels, that there was no ap- 
parent interaction between teaching method and 
intellectual level. 

The following conclusions seem warranted: 

1. The direct and detailed and the directed 
discovery methods of teaching are equally effec- 
tive with regard to initial learning of micrometer 
principles and skills. 

2. The direct and detailed and directed dis- 
covery methods of teaching are equally effective 
with reference to retention of material initially 
learned as measured one week after instruction. 

3. The directed discovery approach to teach- 
ing is superior to direct and detailed instruction 
with respect to retention of material intially 


learned as determined six weeks after instruction. 


4. The directed discovery method of teaching 
is more effective than the direct and detailed ap- 
proach in enabling pupils to made wide applica- 
tions of material learned to new and related sit- 
uations, both at one and six weeks after instruc- 
tion. 





5. There is no interaction of teaching method 
and intellectual level. 
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ANALYSIS OF QUESTIONNAIRE RESPONSES 
BY CERTAIN PSYCHOLOGISTS* 


ANDREW L. COMREY 
University of California at Los Angeles 


A PREVIOUS paper reported the results of a 
survey of psychologists listed in the 1951 APA 
directory as being males, employed at colleges 
or universities, and members of Phi Beta Kappa 
(1). The questions concerned work habits, per- 
sonal characteristics, interests, preferences, 
and attitudes. About ninety percent of the question- 
naires were returned. Each question was statis- 
tically related by means of the epsilon coefficient 
to several internal questionnaire criteria: inter- 
est in administrative work in psychology, indus- 
trial consulting, counseling and guidance, psycho- 
therapy, research, teaching, and publication rate. 
The last criterion measure was determinedas a 
ratio of the number of scientific articles and mon- 
ographs published to the years of full-time pro- 
fessional experience. 

The considerable number of significant rela- 
tionships discovered suggested that further anal- 
ysis might prove worthwhile. Accordingly, sixty 
of the most significant items were chosen to con- 
stitute the variables forafactor analysis. A ran- 
dom selection of 216 cases was taken for which 
phi coefficients were computed between each of 
the 60 items and each other one. Each phi coef- 
ficient was divided by the maximum phi possible 
for the marginaltotals concerned. Since most of 
the items were multiple choice or completion 
items, it was necessary to dichotomize each one 
artificially. This was done as near the median 
as possible in each case. 

Seventeen centroid factors were extracted from 
the matrix of phi over phi max coefficients. The 
two highest loadings on the last factor were . 22 
and .21, respectively. No loading of . 3 or more 
occurred after the ninth factor. The 17 centroid 
factors were rotated analytically by Kaiser’s Var- 
imax method (2). This procedure tends to maxi- 
mize the variance of the squared extended vector 
projections, which usually yields an orthogonal 
simple structure solution if one exists in the 
data. ** 


Results 


Each factor will be presented with those vari- 


*All footnotes will be found at end of article. 








ables for whichthe loadings were . 3 or more. 
The factor number and proposed name will be fol- 
lowed by a figure in parentheses which gives the 
number of loadings between -.1 and . 1 to indicate 
hyperplane density. After a brief statement about 
the factor, the item number and loadings will be 
given, opposite which will appear a description of 
the variable involved. Each variable is usually a 
dichotomized multiple choice or completion item, 
but only a question without the alternatives will be 
presented. It is understood that such question- 
naire responses as ‘‘yes,’’ ‘‘quite a bit,’’ and so 
on, are toward the positive side of the variable, 
despite the fact that the precise alternatives will 
not be given here. 

Factors I, VII, and X were minor factors with 
only one or fewer loadings over. 4; hence no at- 
tempt will be made to interpret them. 

Il. Administrative Interest (35). The two items 
of major importance on this factor clearly identify 
it as being concerned with the individual’s desire 
to participate in administration. 





Description 
How much would you enjoy be- 
ing head of the psychology de- 
partment at a major university? 
. 95 Assuming you were qualified, 
how much would (do) you enjoy 
doing ad ministrative work in 
psychology ? 
. 31 How much do you enjoy work- 
ing on group projects? 


Item Loading 
45 . 56 


III. Teaching Interest (40). Item 57 was one of 
the internal criteria analyzedinthe previous study 
and helped to define factor I. Item 62 also was 
such an internal criterion and represents an im- 
portant defining variable for this factor. 





Item Loading 
36 lw 6B 


Description 
Would you object to a position, 
otherwise desirable, if you had 
no classes to teach? 
Assuming you were qualified, 
how much would (do) you enjoy 


62 . 60 
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doing teaching ? 

How much would you like a 
well-paid teaching position in 
a small liberal arts college of 
good reputation ina small 
community ? 

Assuming you were qualified, 
how much would (do) you enjoy 
doing industrial consulting? 


IV. Moralistic Orientation (35). These items 
embody a rigid adhe rence to established values, 
particularly in the religious and sexual area. It 
is noteworthy that in this sample ‘‘importance of 
religion asa force in the individual’s life’’ should 
be so closely associated. with intolerance of moral 
and philosophical deviates. Whether this would be 
true in a more representative sample of the gen- 
eral population would be difficult to say. 


Description 
How much would you resent a 
colleague who was an atheist? 
. 13 How much would you resent a 
homosexual ? 
. 62 How much of a positive force 
is religion in your life? 
. 56 How much would you resent a 
a colleague who was an adul- 
terer? 
How much would you resent a 
colleague who was a member 
of the communist party? 
How would you describe your 
basic political philosophy (lib- 
eralism vs. conservatism) ? 


Item Loading 
69 . 15 


V. Drive (41). The exact nature of this factor 
is less clearly outlined than is the case with some 
of the others, but the hypothesized identity seems 
reasonable. The title is intended to mean some- 
thing like energy level. 


Description 
How much difficulty do you have 
in writing papers, speeches, 
etc. ? 
. 48 In comparison with other men, 
how much interest in sex have 
you had during your life? 
Are you at ease in social situ- 
ations ? 
Would you mind a job which re- 
quired you to be at a particular 
place 40 hours per week? 
How much would you like a 
well-paid teaching position in 
a small liberal arts college of 
good reputation in a small com- 
munity ? 


Item Loading 
54 -. 58 





VI. Seniority (38). These variables all fit well 
into the seniority category since each is related to 
length of service. 
Item Loading Description 
1 . 85 Age. 

3 . 81 Years of full-time profession- 
al experience. 

About how many scientific ar- 
ticles and monographs have 
you published, or had accepted 
for publication? 

How much harder do you work 
than would be necessary to 
‘get by’’? 


20 . 54 


Vl. Service Orientation (38). Psychologists 
are thought by the layman to be devoted to ‘‘help- 
ing people,’’ but these results make it clear that 
variance exists among psychologists in the extent 
to which they are oriented in this particular di- 
rection. Itis interesting that neither counseling 
or psychotherapy interest proved to be the domin- 
ant variable on any factor but rather these two 
joined with teaching interest as somewhat lesser 
variables on this factor of broader character. 
This would tend to suggest that considerations 
other than the desire to help people are also im- 
portant in determining interest in counseling and 
psychotherapy. 





Description 
How much do you want to help 
people who need it? 
How muchdo you want to do 
something for society ? 
To what extent would you allow 
graduate students’ ex pressed 
interests to determine the con- 
tent of a graduate course? 
Assuming you were qualified, 
how much would (do) you enjoy 
doing counseling and guidance? 


Item Loading 
67 va 


65 . 62 


48 . 44 


Assuming you were qualified, 
how much would (do) you enjoy 
doing teaching? 

Assuming you were qualified, 
how much would (do) you enjoy 
doing psychotherapy ? 

How much weight would you 
place on published research in 
evaluating the merit of a psy- 
chologist ? 


IX. Resentment of Plagiarism (44). Probably 
something a bit broader than the title would sug- 
gest is involved here, but it is difficult to say pre- 
cisely what its nature is. 
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TABLE II 
CENTROID LOADINGS 
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TABLE Ill 
ROTATED LOADINGS 





Description 
How much would you resent a 
colleague who was a plagia- 
rist? 
How much would you resent a 
colleague who was a member 
of the communist party? 
About how many evening hours, 
if any, do you spend on pro- 
fessional work in an average 
week? 


Item Loading 
71 . 70 


70 . 47 


10 -. 41 


XI. Research Productivity (27). Variable 74 
which represented the ratio of number of publica- 
tions to years of professional service, here was 
also one of the internal criteria used in the pre- 
vious analysis. Aninteresting question which in- 
vites speculation is that of whether items 24 and 
25 represent partial cause and effect relation- 
ships with item 74, and, if so, which is cause 
and which is effect? 





Description 
Production rate (ratio of arti- 
cles to years of service). 
Assuming you were qualified, 
how much would (do) you enjoy 
doing research? 

How muchof your profession- 
al career has been spent in 
positions where you were ex- 
pected to do research? 
Would you enjoy a job where 
40 hours per week were de- 
voted to teaching and prepar- 
ation? 

Are you now employed by an 
institution which grants the 
Ph. D. in psychology? 

About how many scientific ar- 
ticles and monographs have 
you published, or had accept- 
ed for publication? 

How muchdo you want to have 
national recognition as a psy- 
chologist ? 

How much influence do you 
expect your work to have on 
the development of psychology 
in the United States ? 
Assuming you were qualified, 
how much would (do) you enjoy 
doing counseling and guidance? 
To how many professional 
journals do you subscribe? 
How much weight would you 
place on published research 
in evaluating the merit of a 
psychologist? 

Do most major universities 
place too much emphasis on 


Item Loading 
74 . 67 


61 . 66 


25 . 61 





research? 

How much emphasis should be 
placed on research methods in 
training clinical psycholo- 
gists? 

How much would you like a 
well-paid teaching position in 
a small liberal arts college of 
good reputation ina small com- 
munity ? 

About what annual income do 
you expect to reach? 

About what percentage of psy- 
chologists measure up to your 
standards as to what a psychol- 
ogist should be? 


XII. Com petitive Personnel Policy (37). The 
individual high on this factor is likely to feel that 
some psychologists are better than others and 
should be rewarded forit. He is also more active 
in attending professional gatherings. 





Description 
Should an out standing man be 
promoted sooner than he other- 
wise would be if he received a 
better offer somewhere else? 
About what percentage of psy- 
chologists measure up to your 
standards as to what a psy- 
chologist should be? 
How many regional and nation- 
al psychology meetings did you 
attend in 1950, 1951, and 1952? 


Item Loading 
27 . 48 


XIII. Entertaining (45). That item 18 should 
have a negative loading and item 16 a positive one 
seems reasonable in viewof the fact that one can- 
not be two places at the same time. On the other 
hand, one would think that frequent home enter- 
taining might call for frequent social ventures be- 
yond the threshold too. Perhaps item 18 was in- 
terpreted by the respondents more in terms of 
public places of amusement. 


Description 
About how many evenings in an 
average month do you enter- 
tain friends in your home? 
About how many evenings in an 
average month do you go away 
from home for an evening’s 
entertainment ? 
How much do you read for rec- 
reational purposes ? 


Item Loading 
16 . 67 


18 -. 48 


45 . 30 


XIV. Materialistic Orientation (44). In one 
way or another, eachiteminthis group has some- 
think to do with activities or events likely to en- 
hance the financial position. Thus, the factor 
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seems to be concerned with the extent to which 
great value is placed on money. 


Description 

Could you be interested ina 

business career ifa really 

good opportunity came along? 

. 48 If you were to receive a large 
inheritance, would you give up 
your job? 

. 47 How muchdo you want to have 
quite a bit of money? 

. 37 Assuming you were qualified, 
how much would (do) you enjoy 
doing industrial consulting? 

. 36 Would you like to direct a re- 

search project involving a 

staff of 100 people or more? 


Item Loading 
23 62 


XV. Hedonism (42). These items seem to 
characterize the highly verbal, fun-loving scholar 
who refuses to take himself or psychology very 
seriously. He feels that his duty is to enjoy life 
for himself and also for those other psychologists 
who are so busy or preoccupied that they haven’t 
time to do so, or perhaps don’t know how. 


Description 
How much time do you spend 
in ‘‘socializing’’ during work- 
ing hours? 
How much do you like biblio- 
graphic work? 
How muchdo you read for rec- 
reational purposes? 
How much harder do you work 
than would be necessary to 


«get by’’? 


XVI. Introversion (35). Two of these items 
place great importance on research and research 
training while the other finds being alone not un- 
pleasant. The most obvious common denominator 
in these responses appears to be a type of self- 
contained, as opposed to people-oriented, per- 


sonality. 


Item Loading 
50 . 47 


44 . 44 
45 . 41 


53 -. 40 


Description 

How much emphasis should be 
placed on research methods 
in training clinical psycholo- 
gists? 

How much weight would you 
place on published research 
in evaluating the merit of a 
psychologist ? 

Do you find it unpleasant to be 
alone? 


Item Loading 
35 . 46 


33 -. 38 


XVII. Collectivism (45). Many individuals in 
psychology emphasize harmonious me r ging into 





the team where each cooperative personis accept- 
ed and there is a minimum emphasis on personal 
evaluation. Self-seeking by energetic and ambi- 
tious group members is frownedupon as contrary 
to groupgoals. Cooperation replaces competition. 
The items loading on this factor appear to define 
such a general point of view. 


Description 

How would you describe your 
basic philosophy (liberal vs. 
conservative) ? 

Do you believe that life in our 
society is too competitive? 
To how many professionai 
journals do you subscribe? 
How much would you resent a 
colleague who was an atheist? 
How many regional and nation- 
al psychology meetings did you 
attend in 1950, 1951, 1952? 


Item Loading 
32 . 47 


38 . 44 
5 -. 40 
69 -. 39 


7 -. 34 


Discussion 


Several of the criteria selected for analysis in 
the original published study emerged here as inde- 
pendent factors. Administration and teaching in- 
terest variables defined two factors individually 
while research interest and publication rate de- 
fined a third. Interest in industrial consulting 
failed to define a factor itself but had a substan- 
tial loading on Materialistic Orientation. The two 
remaining criteria, interest in psychotherapy and 
interest in counseling and guidance, showed up to- 
gether on the Service Orientation factor, although 
with lower loadings than one might expect. Other 
variables not originally conceived as criterion 
measures proved to be important in factor defini- 
tion. 

With respect to the universe of discourse un- 
der examination, it would appear that the most 
significant independent dimensions along which 
these psychologists differ are: interest in admin- 
istration, interest in teaching, moralistic orien- 
tation, seniority, desire to help others, research 
productivity, and materialistic orientation. It is 
interesting that each and every one of these fac- 
tors represents an area frequently involved in 
personnel action and policy decisions. It is also 
interesting and noteworthy that these factors should 
have emerged from the application of an entirely 
analytic rotation criterion. 


Summary 


A factor analysis was carried out on 60 items 
from a questionnaire given to males listed in the 
1951 APA directory as Phi Beta Kappa members. 
The questions concerned work habits, personal 
characteristics, interests, preferences, and atti- 
tudes. Seventeen centroid factors were extracted 
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from the matrix of phi over phi max coefficients. 
Analytic rotations were carried out using Kaiser’s 
Varimax method. Factors of sufficient importance 
to warrant naming were: Administrative Interest, 
Teaching Interest, Moralistic Orientation, Drive, 
Seniority, Service Orientation, Resentment of 
Plagiarism, Dislike for Mathematics, Research 
Productivity, Competitive Personnel Policy, En- 
tertaining, Materialistic Orientation, Hedonism, 
and Collectivism. 


FOOTNOTES 


* This study was made possible primarily be- 
cause of the availabilityof SWAC, an electron- 
ic computer operated by Numerical Analysis 
Research of the University of California, Los 
Angeles, and supported by the Office of Naval 
Research. The opinions expressed are the au- 
thor’s. Further financial support for this work 
came from a research grant by the University 
of California. 


**The tables of correlation coefficients, centroid 
loadings, and rotated loadings have been depos- 





ited withthe American Documentation Institute. 
Order Document No. 4858 from the ADI Auxili- 
ary Publications Project, Photoduplication Ser- 
vice, Library of Congress, Washington 25, D. 
C., remittinginadvance $1.25 for 35mm. mi- 
crofilm or for 6x8 in. photocopies 
readable without optical aid. Make check pay- 
able to Chief, Photoduplication Service, Library 
of Congress. A copy of the complete question- 
naire together with the response frequencies 
also has been deposited. 


REFERENCES 


1. Comrey, Andrew L. ‘‘Publication Rate and 
Interests inCertain Psychologists, ’’ Ameri- 
can Psychologist, XI (1956), pp. 314-22. 


2. Kaiser, Henry F. ‘‘The Varimax Criterion 
for Analytic Rotation in Factor Analysis, ’’ 
Psychometrika, XIX (1954), pp. 173-82. 











JOURNAL OF EXPERIMENTAL EDUCATION 
(Volume 29, Number 3, March 1961) 


APTITUDE AND COURSE ACHIEVEMENT IN A 
SHORT MISSILE TECHNICIAN COURSE* 


HARRY E. ANDERSON, Jr., and FRANK GUENTER** 
USAADHRU, HumRRO 


Introduction 


THIS STUDY seeks to determine limited apti- 
tude characteristics of students ina short missile 
technician’s course, and the general relationship 
of these aptitude characteristics to various parts 
of the course and the final course average. The 
Personnel Research Branch (PRB) of the Adju- 
tant General’s Office, Department of the Army, 
has put forth much effort to stabilize and stand- 
ardize measures Of aptitude characteristics. 
HumRRO has been concerned with training prob- 
lems and the assessment and improvement of 
training circumstances. The purpose of the pres- 
ent study is to assess academic achievement of 
missile technicians in terms of their aptitude 
characteristics, and suggest concrete referents 
for further training research in this area. 

The ten aptitude tests in the present Army 
Classification Battery (ACB), which are adminis- 
tered to most enlisted personnel when they enter 
service, became operational in 1949 (10). Since 
that time, PRB has continually reviewed and an- 
alyzed these measures (e.g. , 2, 3, 10, 11) to en- 
sure their stability and enhance their usefulness 
as predictive measures of success in various 
army occupational specialities. Though research 
is presently programmed for the ACB in the mis- 
sile field, such information is not now available. 

One major interest in HumRRO’s training re- 
search has resided in comparing course achieve- 
ment with later proficiency in the field. An ex- 
emplary study (9) determined the major opera- 
tional functions for missile maintenance person- 
nel, built proficiency and written tests to meas- 
ure ability regarding these functions, and com- 
pared recent school graduates with ex perienced 
field personnel onthe written and proficiency tests 
as wellas severalother variables. The aptitudes 
of the two major groups (i. e., the recent gradu- 
ates and the field-experienced personnel) were 
not compared with regard to most of their ACB 
scores. Aptitude measures in this study were 
comprised of an electronic score (i.e., acom- 
pound of two ACB scores) and the Electronic 


*All footnotes will be found at end of article. 








Placement Test!: aptitude, as measured by these 
variables, correlated to some extent with written 
test scores but demonstrated very little relation- 
ship with the proficiency test scores. Now, it is 
not reasonable that aptitude should be unrelated to 
proficiency (i.e. , ability); therefore, a more ex- 
tensive study of aptitude characteristics of such 
groups might well,prove rewarding from both the 
situational and measurement standpoint. 

Another HumRRO study (1) investigatedthe re- 
lationship of academic achievement in various 
parts of acourse. But again, no extensive use 
was made of aptitude as related to the various 
parts of the course. 


Method 


The Course—The course under investigation 
was a five and one-half week technician course 
conducted at the U.S. Army Air Defense School, 
Ft. Bliss, Texas. The 220 hours inthe course 
were used as follows: 1) 21 hours for Missile Me- 
chanic (MM) training; 2) 34 hours for Launcher 
Area (LA) training; 3) 85 hours for Missile Prep- 
aration and Depreparation (MPD) t raining; 4) 80 
hours for non-academic functions, distributed 
variously throughout the course (e. g., incoming 
and out going processing time, physical training, 
etc.). The course is, essentially, a transition- 
type course following a much longer technician 
course. 

A written examination was given after each of 
the three main parts (i.e., inthis order, MM, 
LA, and MPD) of the course. A total course av- 
erage (TCA) is computed by weighting the first 
two parts of the course one each, and assigning a 
weight of twotothe last, longer part of the course. 
There were four classes, including a total of 119 
enlisted trainees, for whom complete data were 
available. 

The ACB Tests— Eight ofthe ten ACBtest scores 
were available for the 119 trainees as follows: 





Reading and Vocabulary (RV) 
Arithmetic Reasoning (AR) 
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Pattern Analysis (PA) 
Mechanical Aptitude (MA) 
Army Clerical Speed (ACS) 
Shop Mechanics (SM) 
Automotive Information (AI) 
Electronic Information (EI) 


The remaining two ACB tests (i.e., Army Radio 
Code Aptitude and Radio Information) were not in- 
cluded inthe analysis because of scoring difficul- 
ties and lack of data. A compiete description of 
these variables can be found in a previous publi- 
cation (3) and in various other PRB studies. 

The Analysis—The eight ACB test scores and 
the four course scores, including the TCA, were 
intercorrelated for each class separately, and 
the correlations combined by a weighted average 
using Fisher’s z-transformations (e. g., 6:325-6). 
The resulting correlations between the ACB 
scores were factored by the centroid method, and 
the centroid factors were rotated graphically. 
The four course scores were then projected onto 
the rotated factor structure. 


Results 

The intercorrelations among the eight vari- 
ables are presented in the lower half of the cen- 
tral portion of Table I, while the upper half con- 
tains the residuals after the extraction of two fac- 
tors. The right-hand portion of Table I presents 
the correlations of eachofthe four course scores 
with each of the eight ACBvariables. The means 
and standard deviations for the eight ACB vari- 
ables are presentedalso appropriately to the left 
of Table I. 

The Factor Analysis—The 8 x 8 intercorrela- 
tion matrix in Table I was factored by the cen- 
troid method, the communalities being estimated 
as the highest correlation in each column, re- 
spectively. Actually, four factors were extracted 
in a preliminary analysis, but graphic rotations 
on the Zimmerman appar atus failed to produce 
anything on the third and fourth factors. Conse- 
quently, only the firsttwocentroids were used in 
the final analysis; these factors are presented in 
Table Il. The reader will note that the first cen- 
troid has projected variances equal to 3. 4420; the 
second centroid, . 7742. The total variance, us- 
ing cOmmunalities in the diagonals, is equal to 
4.8000. The two centroids, therefore, account 
for approximately 85 percent of the variance in 
the system. 

The two centroid vectors were rotated graph- 
ically through a clockwise angle of 41° 30'. The 
rotated loadings are presented in Table III. 

The first factor is a Technical factor deter- 
mined, for the most part, by MA (. 718), SM 
(.751), AI (.614), and EI (. 768); RV and PA also 
have fairly high loadings on this factor, suggest- 
ing some reading and spatial characteristics in 





the Technical factor. The second factor is an Ac- 
ademic factor with RV (.596), AR (. 753), PA 
(.612), and ACS (.513); MA and SM also have ap- 
preciable loadings on this factor, indicating some 
technical aspects of the factor. 

The ACB and the Course Tests—The four 
course tests were projected onto the rotated fac- 
tor structure (5:209-19), and the resulting load- 
ings are presented in Table IV. The second test, 
LA, and the final course average, TCA, have 
greater loadings on the Technical factor than on 
the Academic factor. The reverse is true for the 
first test, MN; it has more relation to the Aca- 
demic factor than to the Technical factor. The 
third test in the course, MPD, which represents 
the greater amount of instructional time and is 
given the most weight in the computation of the 
final course average, does not seem to have much 
in common withthe system, but most of the attend- 
ing communality is located on the Technical fac- 
tor. Several inferences are available from this 
analysis, as will be presented in the Discussion 
section. 





Discussion 


The ACB Scores—The results ofthe factor an- 
alysis indicate that aptitude characteristics of 
missile technician trainees with respect to the 
ACB tests are of two major, distinct types, viz., 
academic and technical. This out come supports 
the methodology used several times in previous 
HumRRO research (9), the use of both written and 
practical tests as measures of proficiency. The 
use of both types of tests may be required to tap 
both types of aptitudes and abilities of which the 
technicians show definite capabilities. The result 
also lends support to the U. S. Army Air Defense 
School’s policy of using written and practical ex- 
aminations in most of their courses for techni- 
cians; logically it would seem that, if both types 
of examinations are given, success on practical 
examinations will be relatedto technical aptitudes 
while success onthe written examinations will de- 
pend to some extent on both types of aptitudes. 
Hence, a student’s exhibition of content absorp- 
tion throughout the technical courses in the U. S. 
Army Air Defense School will depend on his tech- 
nical aptitudes and, to a more limited extent, on 
his academic-type aptitudes. This is not to say 
that a given course should be built towards apti- 
tude characteristics of the students, but only that 
the measurement of learning throughout any course 
s hould take into account the goals of the course, 
the nature of the program, and the post-graduate 
demands that will be placedon the trainees. Fur- 
ther studies regarding aptitude characteristics of 
successful field-experienced technicians should 
indicate the propriety of measurement character- 
istics in the School’s testing program. 

The above discussion and the dichotomy of ap- 
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TABLE II 


CENTROID FACTOR LOADINGS 





Factor 


—————— 


. 190 


. 462 


Al 


EI 


Sums of 
Squares 


TABLE III 


ROTATED FACTOR LOADINGS 





ge Factor 


PA 
MA 
ACS 
SM 
Al 


EI 


Sums of 
Squares 
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TABLE IV 


PROJECTED FACTOR LOADINGS OF THE FOUR COURSE 
VARIABLES ON THE ROTATED FACTOR STRUCTURE 











Course ll 
Variables Technical Academic 





MM .251 . 498 


. 314 





TABLE V 


PRINCIPAL COMPONENT LOADINGS WITH COMPUTED 
CENTROID COMMUNALITIES IN THE 
PRINCIPAL DIAGONAL 








Factor 


——————— ‘a i h* 


. 542 


. 595 


. 561 


. 646 


. 286 
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titudes among missile technicians suggest an- 
other interesting inference. Proficiency differ- 
ences in various areas of missile technology may 
be due to extreme emphasis on one or the other 
aptitudes. The ability to read schematic diagrams, 
for instance, is probably more closely related to 
academic aptitude while replacing a malfunction- 
ing component in a given missile system is prob- 
ably more closely related to technical aptitude. 
Differential performances in these two activities 
might well be attributable to differences in apti- 
tude characteristics de monstrable with the ACB 
variables. A more thorough study of aptitude 
characteristics with regard to proficiency would 
probably repay further research in any case; 
most certainly, such research would clarify the 
nature of post-graduation job demands that will be 
required of missile technician trainees. 

The sample of technician trainees seems well 
above average onalleight aptitude measures used 
in thisstudy. The means and standard deviations 
obtained on a larger, random sample of enlisted 
men were presented in a previous study (10:4): 


Standard 
Test Mean Deviation 
RV 99.4 
AR 94.7 
PA 99.6 
MA 102. § 
ACS 88. 
SM 100. 
Al 99. 
EI 100. ! 


An examination of similar data to the left of Table 
I reveals that the missile technician trainees not 
only score higher, on the average, thanthe gen- 
eral population of enlisted men, but they also tend 
to vary less about the mean than the general pop- 
ulation of enlisted men. All in all, the missile 
technician trainees seem to be above average in 
every respect with regard to aptitudes. 

There is in the results of the present study an 
interesting anomaly as compared to previous 
classifications of ACBtests. A previous study 
(11:19) refers to the Mechanical Aptitude test as 
a measure of ‘‘general aptitude,’’ while the Army 
Clerical Speed test is referred to as ‘‘...a more 
specific measure peculiar to the job family... ;’’ 
otherwise, our Academic factor contains the gen- 
eral measures and the Technical factor contains 
the more specific measures. True, MA shows 
some appreciable relation to the Academic factor 
but it playsa relatively small part in determining 
the location of this vector; moreover, the contri- 
bution of ACS to the Technical factor is negligible. 
Apparently, the academic and technical aptitude 
characteristics of missile technicians are suffi- 
ciently disparate to locate each of these two tests 
withothers oftheir more general nature. The MA 





variable is moreclosely associated with technical 
aptitude while the ACS variable has more relation- 
ship to general, academic aptitude. As was noted 
previously, however, the missile technician train- 
ees are above average on both variables as well as 
all the rest of the ACV variables used inthis study. 

Aptitude and the Course Variables—Two of the 
course variables, LA and TCA, each show about 
twice as much relationship2 to the Technical fac- 
tor ascomparedtothe Academic factor. This re- 
sult appears to be most reasonable. Surely, a 
technical training program can be expected to ex- 
ploit the technical talents of its trainees within 
some amount of academic framework, but it would 
seem that technical aptitudes should be the most 
important factor in determining a student’s suc- 
cess or failure. Fromthis standpoint, it is en- 
couraging that the total course average, which in- 
dicates the student’s overall standing, and, in this 
instance, is made up entirely of written test 
scores, is determined for the most part by techni- 
aptitudes rather than academic aptitudes, though 
the latter receives ample consideration. 

Missile Mechanics is the only course variable 
more closely associated with academic-type apti- 
tudes than with those aptitudes associated with 
technical prowess. An explanation of this result 
might reside in the fact that MM is the first por- 
tion of the course when students are becoming ac- 
quainted with new material; in this circumstance, 
academic aptitudes might havea larger play in test 
scores than technical aptitudes. Ifthis is the case, 
similar results should be obtainable in research 
with ACB scores and other technical courses. Fur- 
ther studies, of course, might show other reasons 
for the behavior of MM in this study, but it is not 
immediately obvious why MM demonstrates about 
four times as much relationshipto academic, gen- 
eral aptitudes than to technicalaptitudes. Further 
research here should prove quite fruitful. 

From an inspection of Table IV it seems note- 
worthy that with the exception of MPD, the factor 
structure generated by the eight ACB aptitude vari- 
ables accounts for about 30 percent of each of the 
course variable’s variance. The lack of relation- 
ship drawn bet ween the aptitudes and MPD most 
certainly bearscloser scrutiny. Missile Prepar- 
ation and Depreparation consumes more than half 
of the instructional time, and receives as much 
weight in the TCA as both MM and LA together. 
Still, notwithstanding the importance of the MPD, 
this study has told us relatively little about the un- 
derlying aptitude characteristics associated with 
this variable. Almost all of MPD’s communality 
is located on the Technical factor, but a more 
thorough study of this variable should be under- 
taken in future training research to better under- 
stand its nature. 

Though the ACBtests proved quite useful in the 
present study, it should be noted that none of these 
variables correlated with school grades as well as 
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did the Electronic Placement Test in a previous 
HumRRO study (1). This result suggests that lo- 
cally developed tests might well be more closely 
associated with success in various technical train- 
ing courses. Certainly, more research should 
be conducted to combine army-wide aptitude tests 
with locally built tests in an effort to obtain a bet- 
ter understanding of aptitude characteristics as 
related to course achievement and job profi- 
ciency. 

Methodology—Some remarks regarding the 
methodology inthe present study and that in future 
training research seem to be in order. 

Questions regarding the relations hips of two 
sets of variates often arise in training research. 
We can, of course, canonically correlate the two 
sets ofdataand develop the most predictable cri- 
terion. Very often, however, as in the present 
study, we are interested in a parsimonious de- 
scription of one set of variables, and each of a 
second set of variable’s relation to the first set. 
The method of factoring the first set of variables 
and projecting the second set, only makes the re- 
sulting structure seem to be aconvenient approach 
in explaining such relationships in the data. 

In problems involving differential prediction, 
we probably cannot do better than distance func- 
tions and canonical variates, or perhaps the use 
of some derived measure such as centour scores 
(8) related to Mahalanobis’ D? statistic. But for 
exploratory studies, such as the present one, the 
optimum properties of a principal component-type 
solution do not necessarily provide a clear pic- 
ture. Kendall (7:28) purports that methods such 
as the centroid technique ‘‘. .. are objectionable 
and should not be used when they can be avoided’’, 
but we know from Danford’s work, for instance, 
that the centroid, simple summation technique, 
depending on reflections, bears more relation to 
analysis of variance thanthe principal component, 
weighted summation method. Moreover, it seems 
most advantageous in exploratory studies to sep- 
arate variables as much as possible in the factor 
space, and it appears convenient to accomplish 
this result via rotated factor solutions. 

Certain results, of course, can be supported 
by principal component solutions. A principal 
component analysis of the 8 x 8 intercorrelation 
matrix in Table I, using communalities computed 
from the centroid solution, produced the results 
presented in Table V. Again, as in the centroid 
solution, the first two components together ac- 
count for approximately 90 perc ent of the vari- 
ance. Here, the first component reflects high 
loadings for all eight ACB tests, while the second 
component reveals a separation of variables sim- 
ilar to that in the rotatedcentroid solution. Only 
about 15 percent of the variance in the principal 
component solution, however, allows for this dif- 
ferentiation among the variables. The rotated 
centroid solution, then, appears to provide a 





greater differentiation among the variables in 
terms of their variances. 

One other problem isthat the 8 x 8 intercorre- 
lation matrix was built by averaging 4 matrices of 
similar correlations. The present writers know 
of no proof that such a matrix will be Gramian. 
The requirement of aGramian matrix, in any case, 
will carry less import in the centroid technique 
than in a principal component-type analysis. 


Summary 


Course grades in a short missile technician’s 
course were studied in relation to aptitude char- 
acteristics as reflected in eight Army Classifica- 
tion Battery tests. There were four classes, in- 
cluding a total of 119 enlisted trainees, for whom 
complete data were available. 

The Army Classification Battery tests were 
factored by the centroid method. Two centroid 
factors were extracted and rotated graphically. 
The first factor was determined to be a Technical 
factor composed of the following tests: Electronic 
Information, Shop Mechanics, Mechanical Apti- 
tude, and Automotive Information. The second 
factor was defined as an Academic factor charac- 
terized by Arithmetic Reasoning, Pattern Analysis, 
Reading and Vocabulary, and Army Clerical Speed. 

Four written test score distributions were in- 
volved in the missile course: one test after each 
of three parts of the course, and a combination of 
all three tests to form atotal course average. 
Each of the four distributions were projected sep- 
arately onto the factorial structure determined by 
the aptitude variables. The first test covered 
Missile Mechanic training and was more closely 
associated with the Academic factor than with the 
Technical factor. The second test in the course, 
covering Launcher Area training, and the total 
course average proved to have twice as much 
identification with the Technical factor as with the 
Academic factor. The third test, Missile Prepar- 
ation, had very little relationto the system of apti- 
tude tests; the attending communality was located, 
however, onthe Technical factor. Suggestions 
were given for further training research with the 
Army Classification Battery tests. 

The method of the present study was discussed 
in relation to future training research. The cen- 
troid method of factor analysis, and the projection 
of variables onto the resulting factorial structure, 
was suggested for further exploratory studies in 
the area of training. 


FOOTNOTES 


* Permission is granted for reproduction, trans- 
lation, publication, use and disposal in whole 
and in part byorfor the United States Govern- 
ment. The opinions expressed are solely those 
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of the authors and are in no way official; nor 


are they to be construed as representing those 
of HumRRO or any other bureau or agency. 

. 
The authors wish toexpress their appreciation 
to Mrs. Lynne Von Kanel for her most valuable 
assistance in the computations involved in the 
present study 
Present address of Dr. Anderson: American 
Institute for Research, 11607 Washington Place 
Los Angeles 6, California. 


This test was built by personnel in the U. S. 
Army Air Defense School, Fort Bliss, Texas. 


‘*Relationship’’ herein refers to the square of 
the obtained factor loadings, which indicate di- 
rectly the proportion of variance involved. 
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ESTABLISHING CRITERION GROUPS FOR 
EVALUATING MEASURES OF CURIOSITY 
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University of Delaware 
ETHEL W. MAW 
Bryn Mawr College 


CURIOSITY HAS long been considered an im- 
portant attribute, if not always one to be encour- 
aged (4). In recent years ithas received in- 
creasing attention because of its relationship to 
creativity. Most schools of psychology implicit- 
ly or explicitly indicate an awareness of it (2, 7). 
Curiosity or exploratory behavior of lower animals 
has been rigorously investigated (3,6), and some 
effort has also been made to study curiosity of 
human beings (1). 

Although a considerable body of literature has 
accumulated in which curiosity has been discussed, 
very few empirical studies have been directly 
concerned with children’s curiosity. One reason 
for the dearthof investigations involving children 
may be the lack of instruments with which to 
measure curiosity. 

A major problem in developing measures of 
curiosity is establishing validity. One approach 
is to identify a group of children of high curiosity 
and another group of low curiosity andto see 
whether proposed measures of curiosity discrim- 
inate betweenthetwo groups. Isolating such cri- 
terion groups was the purpose of the present in- 
vestigation. 


Securing Judgments of Children’s Curiosity 





It was first necessary to define curiosity in 
terms of behavior. To do this, many statements 
made by philosophers and psychologists through- 
out history as well as reports of empirical inves- 
tigations were analyzed. The analysis is report- 
ed elsewhere. ** 

As a result, for the purposes of this study, an 
elementary school child was said to exhibit curi- 
osity to the extent that he: 

1. reacts positively tonew, strange, incongru- 
ous, or mysterious elements in his envi- 
ronment by moving toward them, by explor- 
ing them or by manipulating them, 

2. exhibits a need or adesire to know more 





*Footnotes will be found at end of article. 





about himself and/or his environment, 

3. scans his surroundings seeking new exper- 

iences, 

4. persists in examining and exploring stimuli 

in order to know more about them. 

The teachers of five fifth-grade classes were 
asked to judge the curiosity of their pupils in light 
of this definition. At the same time, the pupils 
were asked to judge their peers and themselves. 
Thus three judgments ofthe curiosity of the chil- 
dren of each class were secured—teacher -judg- 
ment, peer-judgment, and self-judgment. 

Efforts were made to minimize errors inher- 
ent inthis approach. The evaluative instruments 
were administered to all classes by one person 
who used the same procedure in eachcase. Each 
class was observed by an independent observer 
one hour per week for ten weeks. The cumula- 
tive record of each child was reviewed and ser- 
ious discrepancies between the record and the 
judgments of the evaluators were studied on an 
individual basis. 

Three instruments were developed for the dif- 
ferent types of judgments. Each instrument was 
constructed to include all aspects of the operation- 
al definition. In addition, the instruments used 
by children were either carefully controlled in 
terms of vocabulary or were read to them. 

Each teacher was given the definition of curi- 
osity together with an example for each part of 
the definition. She was told that all of the kinds 
of behavior included might not be observable in 
any one child, but that it was reasonable to sup- 
pose that the more of them a child showed, the 
more curious he was. The teacher was also 
cautioned that the child who showed the most cur- 
iosity might not be the one who was making the 
best classroom adjustment. 

Each teacher was asked to rate her pupils in 
the following manner: 

1. Write the name of the child you consider to 
have the most curiosity on the first line. 
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2. Write the name of the child you consider to 
have the least curiosity on the line corresponding 
to the number of pupils in your class. 

3. Next, write the name of the child you would 
rank second in curiosity on line two. 

4. Then write the name of the child you con- 
sider to have next to the least curiosity on the 
line above the name of the child having the least 
curiosity. 

5. Continue ranking in this manner until you 
have ranked all of the children in your class. 


Since this method appeared to be too compli- 
cated for the majority of the children to use to 
evaluate their peers, a Who-Should-Play-the- 
Part test was created. The children were given 
the description of eight roles—four of children 
whose behavior exhibited much curiosity and four 
whose behavior exhibited little or none. They 
were told to select children in their classrooms 
who were generally most like the characters re- 
quired for the play. 

Typical role descriptions are the following: 


Part 1. This part will be played by a class- 
mate who keeps working for a long time trying to 
understand anything new which can be examined. 
This pupil sticks to problems trying to solve 
them. This member of the class is the last to 
give up when the class is looking for answers to 
questions. This pupil keeps asking questions af- 
ter everyone else has stopped and will remain 
working on strange things after others are done. 
This child often takes things apart, but will work 
a long time to put them together to find out how 
they work. 

Part 8. This part will be played by a pupil who 
misses seeing the things that other members of 
the class seeeasily. This pupil is not easily dis- 
turbed by things that happen in the classroom. 
When something new or strange is brought into 
the classroom, this classmate often does not look 
at it or just gives it a slight glance. 

The pupils were asked to evaluate themselves 
using an instrument consisting of 41 statements 
regarding habits and attitudes and entitled ‘‘ About 
Myself.’’ They were told that there were no 
‘*right’’ or ‘‘wrong’’ answers, that the best an- 
swer was what they thought was true of themselves. 
Although the majority of the items were stated in 
terms of behavior that logically indicated curios- 
ity, some statements were used that implied just 
the opposite. Each ofthe items was related to 
one or more points of the operational definition. 

The following are typicalitems from the About 
Myself test: 


When there is something new in the room, I notice 
it right away 
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PP Sometimes 
Always 


Never 
__Often 
I like to discover new things. 

___ Never 
i. ___ Often 


___ Sometimes 
Always 


I keep away from strange and unusual things. 
Never Sometimes 
Often __—sAlways 


When I see a strange machine, I go up to it and 
look at it. 

Never 
Often 


Sometimes 
> - Always 
All of the judgments of curiosity were made 
during the first week of March after the teachers 
and classes had been working together for at least 
six months. In one classroom the teacher and 
pupils had been together for sixteen months. 


Selecting Groups of High- and Low- 


Curiosity Children 








It was anticipated that certain factors—age, 
race, popularity, sex, and intelligence might 
have some effect onthe judgment of the evaluators. 

Age was kept within narrowlimits by restrict- 
ing the study to one grade in school. The fifth 
grade was selected to lessen reading problems 
and to secure children whose interests were less 
canalized than those of adolescents. 

At the beginning of the study, the pupils’ ages 
ranged from ten years, two months to twelve 
years, eight months. Their Lorge-Thorndike in- 
telligence quotients ranged from 71 to 139. 

Of the 158 children, thirteen were Negroes— 
two girls andelevenboys. Thequestion was raised 
regarding the extent to which racial bias might in- 
fluence judgments made by the teachersor the pu- 
pils. In one room there was one Negro. In each 
of the two other rooms there were six Negroes. 
In one of these, the teacher and the pupils placed 
the same three children in the upper half of the 
class and the other three in the lower half. In the 
other room, the teacher placed five of the chil- 
dren in the upper half while the peers judged four 
to be in the lower half. By using Fisher’s Exact 
Probability Test, it was determined that this was 
not significant at the .05 level and therefore should 
be considered a chance arrangement. 

Another question to be answered was whether 
the substantial agreement found between the judg- 
ment of teachers and pupils could be accounted for 
by popularity. In other words, did teachers and 
peers tend to rate popular pupils high in curiosity 
and less popular pupils less high? In order to de- 
termine the popularity of the children, the Ohio 
Social Acceptance Scale was administered. The 
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TABLE I 


AGREEMENT BETWEEN TEACHERS AND PEERS IN RANKING PUPILS 
IN FIVE CLASSROOMS AS TO THE EXTENT TO WHICH THEY 
EXHIBIT CURIOSITY WITH THE EFFECT OF POPULARITY 
PARTIALLED OUT 








Room 


effect of popularity on the correlation between 
teacher-judgment and peer-judgment was then 
partialled out. Table I shows that the effect of 
popularity was negligible. 

It has been rather consistently re ported that, 
in our society, there exists considerable rivalry 
between the sexes at preadolescence. If sex ri- 
valry existed in these classes, to what extent did 
it influence pupils’ judg ment of the curiosity of 
peers of the opposite sex? To answer this ques- 
tion, the rankings which pupils of each sex gave 
to like-sex and opposite-sex peers were corre- 
lated with each other and with the rankings made 
by the teachers. The majority of the correlations 
were moderate to high. All of them were positive. 

While these findings indicated that there was 
considerable agreement among boys, girls, and 
teachers in judging curiosity, they did not answer 
the question of whether, on the whole, members 
of one sex were judged to be more curious than 
the members of the other. Two approaches were 
made to answer it. The differences between the 
means of the scoresthe peers had assigned to the 
boys and girls and the differences between the 
number of boys and girls the teachers had placed 
in the upper and lower halves of the classes were 
examined. 

Table II shows the mean and variance of peer 
judgment of curiosity of boys and girls for each 
class. It shews that in three of the classes the 
boys were assigned scores that were significant- 
ly more variable thanthose assigned to the girls. 
In other words, inthese rooms there were boys who 
were scored much higher than the girls inthe room. 
There were also boys who were scored much lower. 





Level of Partial 


Significance Tau 


. 0018 . 31 
. 0052 . 30 
. 0188 21 


. 0005 . 39 


Since the differences between some of the var- 
iances were significant, the differences between 
the means were tested with the Mann-Whitney U 
test. An examinationofthe means reveals that in 
some ciassrooms the girls’ means were higher 
than the boys’ means. In others, the opposite was 
true. In no case was the difference significant. 

Although, onthe whole, the teachers placed 
more boys than girls in the upper half of the class 
and more girls than boys in the lower half, this 
arrangement was not consistent in all classes as 
is shown in Table II. When tested, using the chi 
square technique, the differences were found not 
to be significant. In light of the evidence it seems 
safe to assume that the ratings were not unduly 
influenced by sex preference. 

Intelligence was found to be substantially relat- 
ed to both teacher-judgment and peer-judgment of 
curiosity. Teacher-judgment tended to be more 
consistently related to intelligence thandid pupil- 
judgment. Inall cases, the relationship was pos- 
itive and significant as is shown in Table IV. 

To obtain groups of pupils of high and low cur- 
iosity independent of intelligence, lines of regres- 
sion of curiosity on intelligence were drawn for 
each class and pupils whose scores deviated from 
the regression lines were selected. Pupils whose 
scores were at least one-half standard error of 
estimate above the regression line were included 
in the high curiosity group; those whose scores 
were at least one-half standard error of estimate 
below were included in the low curiosity group. 
This was done separately for teacher- and peer- 
judgment. The high-curiosity and low-curiosity 
groups established from the two different judg- 
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TABLE Il 


SEX COMPOSITION OF UPPER AND LOWER HALVES OF CLASS ACCORDING TO TEACHER- 
JUDGMENT OF CURIOSITY, CHI SQUARE, AND LEVEL OF SIGNIFICANCE IN 
FIVE FIFTH-GRADE CLASSES 





Lower Half 


Upper Half 


Level of 
Chi Square Significance 


Room Boys Girls Boys Girls 


9 8 13 


8 5 
. 81 
D : 1.14 


E 3 8 1.92 


*Where the number in the class is odd, the person in the middle is counted with the lower group. 


TABLE IV 


PRODUC T-MOMENT CORRELATION COEFFICIENTS BETWEEN 
CURIOSITY AND INTELLIGENCE* OF PUPILS IN FIVE 
FIFTH-GRADE CLASSES 


Teacher-Judgment of Peer-Judgment of 
Curiosity and I.Q. Curiosity and I. Q. 


Room N 


31 


. ol 
D : . 43 
E 57 28 . 63 


*Intelligence quotients based on Lorge- Thorndike Intelligence Test. 
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ments were then compared. Only pupils who were 
placed similarly by the two judgments were re- 
tained in the tentative criterion groups. 

The five regression equations based upon 
teacher-judgment of curiosity were not signi fi- 
cantly different from each other; those based up- 
on peer-judgment were significantly different. In 
the latter case, aregression equation was written 
for the five classes as one group. The tentative 
criterion groups yielded by the use of this equa- 
tion did not differ from those established by the 
use of the five separate equations. 

There were fifteen children who were judged 
by both teachers and peers to exhibit more curi- 
osity than would be predicted on the basis of their 
intelligence quotients. There were also twenty 
children who were judged by both teachers and 
peers to exhibit less curiosity than would be ex- 
pected onthe basis of their intelligence quotients. 
These groups respectively made up the tentative 
high- and low-curiosity groups. Table V shows 
that the two groups did not differ significantly in 
age, popularity, or intelligence. 

Questions pertaining to race and sex bias were 
raised earlier. It seemed pertinent to raise the 
same questions in regard to the tentative criter- 
ion groups. The limited number of Negroes made 
it impossible to reach definite conclusions, but 
two Negroes were included inthe criterion groups, 
one in the high curiosity group and one in the low. 
Ten boys and five girls were in the high curiosity 
group, and seven boys and thirteen girls were in 
the low group. When tested, this was found to be 
a chance arrangement. It seemed likely that race 
and sex membership had not unduly influenced 
judgment of curiosity. 

Finally, the self-judgments of the children in 
the tentative criteriongroups were examined. On 
the whole, the childreninthe high-curiosity group 
gave themselves higher scores on curiosity than 
did the children in the low-curiosity group. As 
shown in Table VI, the difference between the 
means was Significant at the . 005 level. 

It was concluded that the tentative criterion 
groups were sufficiently established to be consid- 
ered the actual criterion groups. They were, 
therefore, used to validate items intended to 
measure the curiosity of elementary school chil- 
dren. 


Summary 


The purpose of the investigation was to estab- 
lish criterion groups of elementary school chil- 
dren for evaluating measures of curiosity. Teach- 
er-judgment, peer-judgment, and self-judgment 
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of curiosity of childrenin five fifth-grade classes 
were obtained after the classes had been working 
together for six months. 

The judgments of teachers and peers were not 
significantly affected by race, sex, or popularity, 
but were significantly related to intelligence. 
With statistical control of intelligence, high- and 
low-curiosity groups were selected on the basis 
of teacher- and peer-judgment. On a self-ap- 
praisal of curiosity the children of the high-curi- 
osity group rated themselves significantly higher 
in curiosity than did the children of the low-curi- 
osity group. The final criterion groups consisted 
of fifteen children of high curiosity and twenty of 
low curiosity. 


FOOTNOTES 


* The research reported herein was performed 
pursuant to acontract with the United States 
Office of Education, Department of Health, Ed- 
ucation and Welfare. 

**A 29-page report describing this analysis and 
giving complete data in tables has been deposit- 
ed with the American Documentation Institute. 
Order Document No. 6558, remitting $2.00for 
35 mm. microfilm or $3. 75 for 6 x 8 in. photo- 
copies. Advance payment is required. 
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AN INVESTIGATION OF THE TEACHING OF 


CHRONOLOGY IN THE SIXTH GRADE 


VAL E. ARNSDORF 
University of California, Berkeley 


THE TEXTBOOKS basic to the teaching and 
learning of the social studies are replete with 
words and phrases relating to time and historical 
chronology. Despite the frequent oc currence of 
such terms, little is known about children’s abil- 
ity to comprehend them and to use them effective- 
ly in dealing with the problem of this subject mat- 
ter field. Moreover, little is known--perhaps 
even less--about instructional materials and 
methods which may increase understanding of 
these terms. 

This article reports one attempt to secure re- 
search information in this relatively neglected a- 
rea. The investigation involved the use of two 
comparable groups in Grade 6. While both groups 
studied the same social studies unit for seven 
weeks, namely, ‘‘AncientC iv ilizations,’’ no ef- 
fort was made to assist the teachers and the pu- 
pils in the control group. The teachers and the 
pupils of the experimental group, on the other 
hand, had the opportunity to try out the effect of 
specific aids to be described shortly. It is to be 
borne in mind, however, thatpriorto their use in 
this experiment, there waslittle empirical evi- 
dence that these aids would be really beneficial. 


Previous Related Research 





There is in general no lack of research on the 
factors that influence children’s ability to under- 
stand subject matter and to cope with the problems 
therein; but the research which relates specific- 
ally to understanding and problem solving in sub- 
ject matter making large use of time and histori- 
cal chronology is very limitedindeed. The three 
most pertinent research reports are reviewed 
very briefly. 

Oakden and Sturt (1922) (5) investigated the 
developmental process inacquiring understanding 








of chronology concepts, using a series of tests. 
They concluded from their test results and from 
their observations that growthin understanding of 
time is slow and that an adultlevel of understand- 
ing is not attained before the age of thirteen or 
fourteen. While they were able to demonstrate 
differences in rates of development, they did not 
set up any particular program of instruction de- 
signed to accelerate development. 

Pistor (1940) (6) studied the effects of training 
on the development of time concepts. His method 
was to compare the results of two equivalent 
groups, one following a separate subject approach 
in history and geography in grades four and five, 
the other following, basically, a geographical ap- 
proach with history taught incidentally. The mean 
scores of the two groups on the final tests of time 
concepts were about equal. Greater growth, but 
not significantly so, was reported after repeating 
the experiment with a similar program in grade 
six. According to Pistor, his evidence indicated 
that ‘‘maturation’’ is dominant over training in the 
development of time concepts. There seems to 
be, in his data, little support for this hypothesis 

A third study, Friedman’s (1944)(3), dealt with 
the variety and importance of time concepts in the 
life of both children and adults. Using interviews, 
tests, and time-lines, Friedman found that child- 
ren have a slight understanding of time when they 
enter school and progress in understanding with 
each succeeding grade, attaining full comprehen- 
sion of our conventional time system by the sixth 
grade and approaching maturity in comprehension 
of time concepts by thetenth grade. The rela- 
tionships reported betweenscores ontests dealing 
with time on the one hand and intelligence on the 
other were low, as were also the relationships 
between test scores and socio-economic status. 
Differences in performance on the tests by boys 


* Abstract of an unpublished dissertation by the same title, offered as partial fulfillment of the require- 
ments for the Ph.D. degree at the University of Minnesota, 1959. 
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and girls were not significant. 

As a group, these studies agree in emphasiz- 
ing the difficulties involved in understanding time 
concepts; also in concluding that maturity in such 
understanding is the result ofa slow developmen- 
tal process. However, noneofthe three provides 
much evidence on the effect of organized instruc- 
tional attempts to increase children’s ability to 
understand time concepts as they are employed in 
the social studies. It wasfor this reason that the 
present investigation was undertaken. 


The Present Study 





Subjects. The experiment proper was delayed 
until the initial testing program could be com- 
pleted. At this time, the Lorge-Thorndike Intel - 
ligence Test, Non-Verbal Battery, Level III, was 
administered. (The scores were later used also 
to divide the groups intothree ability levels, 
from top to bottom, 110-145, 95-109, and 50-94. ) 
Table I summarizes data on sex, average class 
size, and average I.Q. of the experimental and 
control groups, showing them to have been close- 
ly comparable on the factors mentioned. All sub- 
jects were drawn from the same mid-western 
city school system. 

Differences in Instruction. As has been stated 
above, both experimental and control groups 
studied the same social studies unit from the 
same textbooks and for the same period of time. 
The instructional program of the experimental 
group, unlike that of the control group, wasin- 
tended to identify ahead of time certain anticipat- 
ed learning difficulties and to provide assistance 
in meeting them. It is to be re-emphasized, 
however, that the devices and suggestions offered 
had not been previously tested in order to deter- 
mine their worth. 

In general terms, the forms of special assis- 
tance given the experimental classes in the seven- 
week period were as follows: 

1. Specific identification of, and instruction 
in, every term in the texts relating to time. 

2. Use of various time-lines and charts to add 
concreteness to such abstract terms. 

3. The writing of biographical and autobio- 
graphical sketches to stimulate interest in past 
and present events. 

Testing Instruments Employed. Reference has 
already been made to the uses of 1) the Lorge- 
Thorndike Intelligence Test. 2) The comprehen- 
sion section of the Gates Survey Reading Test was 
used to control reading ability as afactor in later 
analyses. 3) Gains in the abilities essential to 
success in the social studies were determined by 
administering, both initially and finally, the Iowa 
Every-Pupil Test B, Basic Work-Study Skills. 














4. In order to measure incr eases in the un- 
derstanding of historical time, the writer pre- 
pared an original test battery which was given 
both groups of pupils before and after the experi- 
mental period. Included were the following six 
sub-tests: Vocabulary of Chronology, Ordering 
Four Events without Dates, Ordering Two Events 
without Dates, Relative Time, Ordering Four E- 
vents with Dates, and Time Absurdities. The i- 
tems included in this battery were selected from 
information found in reference books, basal texts, 
and supplemental social studies books available 
to pupils in the intermediate grades. Reliability 
coefficients for the six tests, based onthe results 
of the initial testing program, ranged from . 98 to 
. 61 and indicate some variation inthe consistency 
of these measures. Inter-correlations among the 
tests ranged from .581 to .014 and reveal that the 
tests did differentiate fairly well various ways of 
measuring the understanding of historical time. 

Hypotheses Tested. The effects of the differ- 
entiated programs of instruction were measured 
principally with respect a) to gains in basic study 
skills and b) to gains in understanding of time, as 
determined by scores on the six separate tests. 
In each type of comparison, there was interest in 
the sex of the learners and in their levels of in- 
tellectual ability. The method of analysis adopt- 
ed may be illustrated with respect to gains in 
basic study skills, which led to atotal of four hy- 
potheses, since sometimes additional controls 
(e.g., reading comprehension) were employed, 
and sometimes they were not. The four hypothe- 
ses for basic study skills are: 

1. There is no significant difference in the 
final mean scores on the tests of basic study 
skills a) for the experimental and for the control 
groups; b) for boys and for girls; andc) for levels 
of intellectual ability; nor is there d)a significant 
interaction between treatment* and sex or level 
of intellectual ability. 

2. There is no significant difference in the 
final mean scores on the tests of basic study 
skills, when the initial scores on these tests are 
controlled a) for the experimental and for the 
control groups; b) for boys and for girls; and c) 
for levels of intellectual ability, nor is there d) 
a significant interaction bet ween treatment* and 
sex or level of intellectual ability. 

3. There is no significant difference in the 
final mean scores on the tests of basic study 
skills when reading comprehension scores are 
controlled a) for the experimental and for the 
control groups; b) for boys and for girls; and c) 
for levels of intellectual ability, nor is there d) 
a significant interaction bet ween treatment* and 
sex or level of intellectual ability. 

4. There is no significant difference in the 





* The single word ‘‘treatment’’ is used here, in the interest of brevity, to refer to method of instruc- 
tion, differentiated as previously explained for the experimental and the control groups. 
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final mean scores on the tests of basic study 
skills when both the initial scores on these tests 
and on reading comprehension are controlled a) 
for the experimental and for the control groups; 
b) for boys and for girls; and c) for levels of in- 
tellectual ability; nor is there d) a significant in- 
teraction between treatment and sex or level of 
intellectual ability. 

No single score was computed for understand- 
ing of time and historical chronology. Instead, 
the gains on each of the six sub-tests are treated 
separately. For each of these, then, there are 
four null hypotheses corresponding to the four for 
basic study skills. The first null hypothesis for 
study skills needs merely to be re-worded, to get 
six hypotheses relating to understanding of time, 
by substituting in turn Vocabulary, 4-Event Or- 
der, etc. , for the term study skills. There is 
therefore no reason to state in full the resulting 
twenty-four null hypotheses, expecially since 
their nature will be clear enough when the findings 
are presented. 

Statistical Procedures. All the null hypotheses 
were tested by an approximate method of analysis 
of variance and covariance. The unequal frequen- 
cies resulting from the multiple classification of 
the data prevented the use of a more exact test. 
The robust character of the basic technique of an- 
alysis of covariance (1, 2, 4) provides adequate 
defense for the techniques used, even though not 
all assumptions were satisfied. This fact, how- 
ever, makes it necessary to be somewhat guard- 
ed in interpreting results. 

Gross Findings--Basic Study Skills. At this 
point, only the data relevant to the com parisons 
labeled a) in the four null hypotheses are treated, 
those for comparisons b), c), and d) being post- 
poned for the time being. Inother words, the 
findings here in question relateto the experimen- 
tal and the control groups as wholes, under the 
four conditions described. The quantitative facts 
are assembled in Table Il. None of the differen- 
ces between the two groups so far as basic study 
skills are concerned proved to be significant, 
whether the comparisons are based upon the test 
scores alone or on the test scores with three type 
types of control: 1) initial scores onthetests, 2) 
reading comprehension scores, and 3) both 1) and 
2). What this means is that the instruction de- 
signed to increase the ability to deal withtime 
concepts had no discernible effects upon study 
skills as such. Perhaps none should have been 
anticipated since the types of training given the 
experimental group were specialized and could 
hardly be expected to develop skills either that 
are represented as such inthe study skills test or 
that could be generalizedto reveal their presence 
through transfer. 

Sub-abilities in Understanding Historical Time. 
Table Ill, like Table II, contains the resultsof 
comparing relative gains forthe experimental and 














the control groups as wholes. For each ofthe six 
tests of time understanding there are fournull 
hypotheses, as already explained, to make a total 
of twenty-four. In all twenty-four instances, the 
advantage lay with the experimental group, but in 
two, by margins so small as to be completely 
negligible (2-event order: no dates). In the re- 
maining twenty-two cases, all except four differ- 
ences are significant. 

Of the sub-abilities measured, one produced 
no significant differences at all in the relative 
gains made by the two groups. The correspond- 
ing test required pupils to arrange in correct 
temporal sequence two undated historical events. 
Why the instruction on the understanding of time 
given to the experimental group produced no ef- 
fects at this point is not known. 

Of the remaining five sub-abilities, the one 
making the poorest record (that is, favoring the 
experimental group least often) was 4-event or- 
der: no dates; and its similarity in psychological 
demands to the least successful of all (2- event 
order: no dates) is obvious. The experimental 
subjects surpassed the control subjects only a) 
when there were no controls at all (beyond that 
introduced by the differentiated instruction itself) 
and b) when reading comprehension scores were 
statistically controlled. 

Marked success attended efforts to teach child 
ren in the experimental group a) the vocabulary 
of historical time, b) the ability to distinguish be- 
tween the relative times of events, c) skillin 
placing four dated events in correct temporal or- 
der, and d) the ability to detect time absurdities. 
All sixteen potential differences are significant. 
Since care was taken to prevent teaching which 
would amount to coaching for the tests, it is be- 
lieved that the superiorities of the experimental 
children in these four sub-abilities are realities 
and not artifacts. 

Findings Related to Sex and Ability Sub-groups. 
In this section of the report it will be possible to 
include only part of the data obtained when exper- 
imental and control groups were treated not as 
wholes, but in terms of sex sub-tests and of 
sub-groups made upon the basis of intell egence 
test scores. There were three sub-groups of the 
latter kind. 

In Table IV are presented the F values obtain- 
ed when, with the initial scores controlledin each 
case, comparisons were made of the relative 
gains on the seven criteria--basic study skills, 
and the six aspects of time understanding taught 
the experimental group and tested in both groups 
before and at the conclusion of the experimental 
period. 

According to the figures in the upper row of 
the table, the topmost of the three intelligence 
groups, whose scores had surpassed those of the 
middle and lowest groups on the preliminary 
tests, maintained their superiority and added to 
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TABLE Il 


SIX SUB-ABILITIES, UNDERSTANDING OF TIME: SUMMARY OF F-VALUES OF THE 
ANALYSIS OF VARIANCE AND CO-VARIANCE TESTS OF THE SIGNIFIC ANCE OF 
DIFFERENCES BETWEEN MEANS OF THE EXPERIMENTAL AND CONTROL GROUPS 








Vocab- 4-event 2-event Relative 4-event Time 
ulary order: no order: no time order: absurdities 
dates dates dates 





E and C; no other controls 26. 89* 11.47* 2.25 40. 99* 


E and C; initial scores 
controlled 17.33 2.67 


E and C; reading comprehension 
scores controlled 28. 55* 11. 43* 


E and C; both initial and reading 
scores controlled 19. 15* 





* Differences significant at the .01 level. 


TABLE IV 


SUB-GROUPS, SEX AND INTELLECTUAL LEVEL: SUMMARY OF F-VALUES OF THE 
ANALYSIS OF COVARIANCE TESTS OF THE SIGNIFICANCE OF DIFFERENCES, 
INITIAL SCORES IN EACH OF THE SEVEN BASES OF COMPARISON CONTROLLED 











Sub-group Basic study Vocab- 4-event 2-event Relative 4-event Time 
skills ulary order: no order: no time order: absurdities 
dates dates dates 





Level of ability 11. 83* 7.62* . 32. 98* 20. 37* 


Sex 5. 33 515. 04* 





* Differences significant at the .01 level. 
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it by significant differences in all but one of the 
seven types of measure, namely, relative time. 

According to the figures inthe second row, 
there was enough difference to be worth reporting 
only in the case of basic study skills and of the 
ordering of two undated events. In both instances 
superiority lay with the boys, but in only one of 
them, significantly so. 

An attempt was made to analyze the ‘‘interac- 
tion’’ between method of instruction experienced 
on the one hand and sex and intellectual level on 
the other. (This type of analysis is represented 
in item d) in the four null hypotheses stated on a 
previous page. When, asin Table III, initial 
scores were controlled in all analyses, only one 
significant difference was found. This had to do 
with the ordering of two undatedevents, andit was 
accounted for by the excessive influence of the 
boys. The final mean scores on this criterion 
were higher a) for boys of superior ability, b) for 
boys in the experimental group, and c) for boys 
of superior intellectual ability inthe experimental 


group. 


Discussion 


Interpretation of the findings, tobe summariz- 
ed below, is dependent upon the conditions under 
which the investigation was made. a) A prime 


consideration is the validity and the reliability of 
the measures employed. Some information has 
been given above concerning these matters, and it 
is believed that the quality of the measures is ad- 
equate for the purposes in question. b) All sub- 
jects were drawn from self-contained classrooms 
in grade six in a single mid-western community. 
How far the present findings can be generalized 
for samples of a different character, it is impos- 
sible to say. 

c) No controls were placed on the subjects’ 
previous educational achievement and experiences 
in the social studies orin any other area of the 
curriculum; that is, no controls beyond those of 
random selection and statistical analysis. d) The 
instructional program used with the experimental 
subjects had few provisions for caring for indi- 
vidual differences. It probably would have proved 
much more effective with improvements at this 
point. 

e) The vocabulary taught the children was con- 
fined to the time-terms of the basal social studies 
textbook used by both groups of children. f) The 
experiment ran for only seven weeks -- relatively 
long perhaps as such experiments go, but short 
when considered educationally. What effects of 
the experimental program would be with a longer 
period for learning is problematic. g) After all, 
but a single instructional program was tried out, 
and this without much prior evidence concerning 
the probable effectiveness of its various aspects. 
What might be accomplished by a different, or a 





better program, is also problematic. 

The findings of the investigation may be sum- 
marized as follows: 

1. The instructional program, which for seven 
weeks emphasized the identification of, and spe- 
cific instruction on terms relating to chronology 
and which made use of time lines and biographical 
and autobiographical materials prepared bythe 
children, fostered a) the comprehension of defi- 
nite and indefinite time-terms; b) the abilityto 
recognize relative lengths of time between periods 
and to ascertain the similarity of time-distances 
with reference to given events; c) skill in ordering 
events with dates, and d) competence in recogniz- 
ing time absurdities. 

2. This instructional program had little or no 
effect on other study skills as measured by an ap- 
propriate test. 

3. Its usefulness in teaching the ability to or- 
der undated events varied with the number of e- 
vents to be placed in temporal sequence. It help- 
ed the more capable boys most when only two un- 
dated events were involved. 

Two additional observations seem to be war- 
ranted. The first is that, if the evidence adduced 
is trustworthy, we can say with confidence that 
children can profit from systematic instruction 
of the kind undertaken, to increase c on siderably 
their understanding of and ability to use the time 
relationships common in the social studies. This 
judgment is in contrast with Pistor’s conclusion 
as stated above, which was to the effect that we 
can only wait for time to bring the desired chang- 
es--the skills and knowledge involved are the pro- 
duct of maturation. 

The second observation is‘based upon much 
evidence, which can be only hinted at in this re- 
port, that children encounter a good deal of diffi- 
culty in the social studies because of their defi- 
ciencies in dealing with time relationships. The 
success of the program set up for this study im- 
plies the value of other investigations with other 
plans of instruction, including devices and aids 
not used in this research. Long-time longitudinal 
inquiries starting in grade one should also provide 
a good deal of information forthe improvement of 
instruction at the points of interest in the research 
described above. 
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THE PLACE GIVEN to arithmetic in the ele- 
meatary school instructional program has varied 
from time to time. At the end of the colonial pe- 
riod arithmetic had just emerged from occupying, 
as described by Jessup (2), an incidental place 
with no formal recognition of the subject in the 
time allotment in the daily program. However, 
the rapid development of com merc ial interests 
soon brought out the practical values of this sub- 
ject and toward the end of the nineteenth century, 
half the school day was often devoted to arithme- 
tic. 

During the last decade of the nineteenth cen- 
tury, many discussions were held concerning the 
place that arithmetic should occupy inthe elemen- 
tary curriculum and the proper content that should 
be taught. The content and method of arithmetic 
prior to 1900 had been directed toward 1) mastery 
of the fundamental processes, 2) the extension of 
the basic facts to the higher decades, 3) the mem- 
orization of formulas, and 4) the application of 
these formulas in as many different situations as 
time would permit. Scientific studies were begun 
about 1900 and the years since then have witness- 
ed great changes in the development of methods of 
teaching arithmetic. 

Examination of professional literature and el- 
ementary guidebooks on the subject now reveals 
several fundamental changes inemphasis with re- 
spect to the purpose and the place of arithme tic 
as a school subject. These sources indicate that 
the aims of arithmetic still point toward mastery 
of the basic skills but that arithmetic goes beyond 
this mastery to include 1) a meaningful under- 
standing of numbers and the numberprocesses, 
2) the development of the ability to generalize, 
and 3) the application of arithmetical principles 
in real life situations. 

In 1923, David Eugene Smith evaluated the pro- 
gress that had been made in these words: 

‘It was about twenty-five years ago that the 
movement for weighing values in arithmetic se- 
riously began in this country. It was then that 





works of promise upon the teaching of the subject 
began to appear, and it wasonly a little later that 
the text books first attacked successfully the un- 
real topics and problems of the past andseta 
standard for the real ones of the present and the 
future. Since that movement began there has been 
a veritable revolution inthe subject matter, in its 
arrangement, in the spirit with which it is pre- 
sented, and in the textbooks in which the work is 
set forth. ’’ (5) 

Research studies continued and newer editions 
of textbooks were published. In 1950, Wilburn 
and Wingo stated: 

‘In spite of the research and publications of 
findings in the psychology of arithmetic and relat- 
ed fields, and in spite of alarge number of im- 
provements in materials for teaching, the pro- 
gram of arithmetic in many elementary schools is 
not significantly different from the program of 
schools in the 1890’s. There are superficial dif- 
ferences, to be sure, but in fundamental respects 
the same procedures and the same psychology of 
learning often obtain. ’’ (6) 

There is presently available animmensea- 
mount of critical, scientific material bearing upon 
the appropriate content and the best methods of 
teaching arithmetic. 

The textbook was chosen for analysis in this 
study as a source of the application of research 
recommendations. Reeder (4) in his historical 
survey stated that ‘‘the best expression of the 
methods of teaching any branch of the curriculum 
at any period of its history is revealed in the text- 
books of that period.’’ Judd (3) held that ‘‘there 
is no influence in American schools which does 
more to determine what is taught to pupils than 
the textbook, ’’ while Bagley (1) referred to the 
textbook as the principal agency of instruction in 
American schools. 

This study was set up to discover the trends 
that could be found in the acceptance or non-ac- 
ceptance of research recommendations in the 
teaching of arithmetic, as indicated by a study of 
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the content of elementary arithmetic textbooks 
published in the United States between 1900 and 
1957. A first purpose was to note the degree of 
connection between the recommendations made by 
men who had carried on scientific investigations 
in arithmetic and the changes that had been intro- 
duced into the content of arithmetic textbooks. A 
second purpose was to check the recommendations 
which seemed to have been rejected by the authors 
of textbooks although presented as substantiated, 
credible findings based upon scientific investiga- 
tion, 

Behind these two purposes were the ultimate 
intentions of 1) determining the status of arithme- 
tic research in the curriculum development pro- 
gram, and 2) isolating, if possible, reasons for 
the gap between recommendations for changes in 
teaching methods and the application of those re- 
commendations. 

The following questions were proposed: 1) Are 
the reasonably credible conclusions of research 
studies finding their way into textbooks? If so, at 
what rate? 2) Can reasons be identified which 
might explain the acceptance of some andthe non- 
acceptance of other credible research findings? 
3) Do the new textbooks generally recommend 
those techniques which have beentried out exper- 
imentally and found successful? 4) Is there any 
evidence that some techniques are recommended 
by textbook authors before they have been tried 
yut experimentally? 5) Is there a time pattern 
which is generally followed between the recom- 
mendation of research studies and their applica- 
tion in textbooks? 

An over-all review of the available literature 
ym the research studies in elementary arithmetic 
was made to determine the trends that had devel- 
oped since 1900 in content andinmethod. Two 
lists were drawn up, one with topics which 
seemed to have been definitely influenced by re- 
search recommendations, the other containing 
topics which apparently had not yet been greatly 
influenced by research. Twenty-five topics were 
thus gathered together and presented to a national 
jury of nineteen experts who chose twelve for this 
study. 

The topics which were selected by the jurors 
are listed here with the frequency of choice. 


LIST I--Topics that apparently have been influ- 

enced by research recommendations. 

1. Method of placing the decimal point 
in the quotient. 15 
Apparent method vs. increase-by-one 
method of determining the quotient. 13 
Imaginative settings for verbal prob- 
lems. 10 
Placement of long division. 10 
Elimination of awkward and unre- 
alistic fractions. 9 
Placement of common fractions. 9 





LIST Il-- Topics that apparently have not yet 
shown the influence of research recommenda- 
tions. 

1. Testing for concepts rather than for 
speed and accuracy only. 19 
2. Developing and extending generaliza- 
tions rather than memorizing rules. 17 
Building concepts through the use of 
concrete materials in classes above 
the primary grades. 13 
The use of illustrations as visual 
aids rather than as decorations. 11 
Method of placing the decimal point 
in the quotient. 10 
Rationalizing division of fractions 
through using the common denomina- 
tor rather than the inversion method. 10 


A survey was made to locate the research re- 
commendations and the critical opinionon each 
of these topics. This wasfollowed by the exam- 
ination of 153 series of elementary arithmetic 
textbooks published in the United States by twen- 
ty-nine different publishers between 1900 and 
1957. Trends were indicatedin two ways. Ta- 
bles were compiled showing the dates when the 
application of research recommendations was 
noted in each of the different series. Then fig- 
ures were drawn which showed the percentage of 
application at different periods of research ac- 
tivity. 


Summary and Conclusions 





Accepted Findings. The effect of research 
upon the content presented and the methods sug- 
gested in arithmetic textbooks was found to have 
been direct and immediate in nine of the twelve 
topics investigated. 

1. Between 1900 and 1917, 69.2 percent of the 
textbooks examined recommended using the inte- 
ger method for determining the placement of the 
decimal point in the quotient. From 1917 to1946, 
all the textbooks presented this method but since 
1946, 93.7 percent recommend it while 6.3 per- 
cent suggest teaching the older, subtractive meth- 
od. 

2. The question of the choice between the ap- 
parent and the increase-by-one method of esti- 
mating the quotient has never been settled but 
textbooks have followed research recommenda- 
tions. From 1900 to 1927, 51 percent of the au- 
thors recommended the apparent method; between 
1928 and 1940, 71.4 percent advocated the in- 
crease-by-one method. Since 1940, the trend has 
been slightly in favor of the apparent method with 
57. 5 percent recommending this method, for re- 
search has also shifted back to it. 

3. Imaginative settings have been acceptedas 
the bases for verbal problems. Between 1900 
and 1919, 79.1 percent of the books examined 
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made no use of verbal settings; from 1920 to 1935, 
58. 6 percent made great use of such settings while 
only 3.4 percent did not use them atall. Since 
1935, 90.0 percent of the books examined were 
found to make great use while 1.25 percent made 
no use of imaginative settings for verbal prob- 
lenis. 

4. Awkward and unrealistic fractions also 
showed very definite trends. Whenthe books pub- 
lished between 1900 and 1920 were examined, it 
was found that only 17.5 percent had eliminated 
such fractions for the most part, while 42.5 per- 
cent made no effort to avoid them. Between 1920 
and 1930, 70.0 percent of the textbooks examined 
had eliminated them and since 1930, 94.4 percent 
have been free from the large, awkward, and un- 
realistic fractions that were once widely used. 

5. Grade placement of long division and of 
common fractions has been raised. Examination 
of books published between 1900 and 1906 showed 
that 61.5 percent of the authors introduced the 
process of long division in third grade. Between 
1907 and 1930, it was raised tofourth grade in 
88.2 percent; from 1931 to 1940, 50.0 percent of 
the texts introduced it in the fourth, 30.0 in the 
fifth, and 20.0 percent spread it out over both 
grades. From 1941 to 1957, 81.5 percent have 
placed it in the fifth grade. 

Between 1900 and 1910, 36.8 percent of the 
textbooks examined presented addition and sub- 
traction of common fractions in third grade and 
47.4 percent introduced these two processes at 
fourth grade level. From 1910to1930, 37.9 per- 
cent advocated the fourth grade while the remain- 
ing 62.1 percent divided the teaching beween 
fourth and fifth grades. From 1930 to 1940, 30 
percent had moved the placement up to the fifth 
grade and since 1940, 50.0 percent have favored 
fifth grade with 44.7 percent dividing the work be- 
tween fourth and fifth, leaving only 5.3 percent 
still presenting these two processes completely in 
the fourth grade. 

Multiplication and division of fractionsseta 
more definite trend. From 1900 to 1925, 17.6 
percent of the textbooks exam ined set the fourth 
grade while 67.6 percent setthe fifth grade as the 
level for introducing these topics. Between 1925 
and 1940, the fourth grade was elim inated com- 
pletely, 47.8 percent set the fifth and 30. 4 intro- 
duced the work in the sixth grade. Since 1940, 
67. 6 percent begin instruction in these two pro- 
cesses in the sixth grade. 

6. Mlustrations are now used as visual aids. 
Between 1900 and 1937 when theearliest research 
recommendations appeared, 30.4 percent of the 
textbooks examined had no illustrations of any 
kind while only 16.1 percent made use of illustra- 
tions as visual aids. Between 1937 and 1946, all 
the texts examined contained illustrations but 47.1 
percent were decorative only, 23.5 percent had 
diagrams only, and 29.4 percent used the illus- 





trations as visual aids. Since 1947, 80.9 percent 
of the illustrations found in the textbooks examin- 
ed were used as visual aids. 

7. The common denominator method is used 
to introduce the process of division of fractions 
but is not presented as the basic method in any 
of the recent textbooks examined. From 1900 to 
1927, 55.5 percent of the books examined used 
the inversion method, 24.4 percent the common 
denominator method, and 20.0 percent the recip- 
rocal method. Between 1927 and 1941, all the 
books examined presented only the inversion 
method. Since 1942, 75.7 percent have advocated 
the inversion method while 24. 3 percent introduce 
the topic by the common denominator method but 
change to the inversion method almost immediate- 
ly. 

8. The inclusion of tests designed to measure 
concepts was found to be arecent trend. From 
1900 to 1930. 97.0 percent of the texts examined 
did not include such questions. Between 1930 and 
1940, this figure dropped to 52.6 percent as 31.5 
percent contained some questions and 15.9 per- 
cent made considerable use of questions to mea- 
sure concepts. Since 1940, 46.0 percent of the 
books examined have made considerable provision 
for them while 26.0 percent did not have any of 
these questions. 

9. The use of materials that should contribute 
to the development of generalizations was found 
to be a very recent trend. From luv to 1945, 
92.5 percent of the texts examined made no pro- 
vision for such materials. Since 1945 when the 
Second Report of the Commission on Post-War 
Plans stressed the need for this type of material, 
56.3 percent of the texts have suggested consider- 
able use, 21.8 percent some use, and 21.8 per- 
cent have not advocated the use of this type of 
material. 

Rejected Findings. Recommendations pre- 
sented in research studies were found to have 
been modified or rejected by authors of textbooks 
in three of the topics underinves tigation in this 
study. 

1. The major research studies which were 
carried on from 1926 to 1938 recommended that 
the introduction of the process of dividing by two 
or more figures be delayed until the sixth grade 
and spread out over three years. Modification 
of this recommendation was noted when it was 
found that 81.5 percent of the textbooks published 
since 1930 introduced the topic in the fifth grade 
and that none introduced it in the sixth or expect- 
ed the process to take more than two years for 
the complete presentation. 

2. Grade placement of common fractions fol- 
lowed a similar trend. The major research stu- 
dies recommended that these processes be spread 
out from the fifth through the ninth grades, and 
suggested that some phases should never be 
taught. It was found that addition and subtraction 
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of common fractions have been placed in the fourth 
grade by 5.3 percent of the texts examined, in the 
fourth and fifth grades by 44.7 percent, and in the 
fifth grade by 50.0 percent of the textbook authors 
since 1940. Multiplication and division of com- 
mon fractions have beenintroduced in the sixth 
grade by 67.6 percent of the authors but none de- 
lay it until the seventh. All of the processes are 
completely presented by the end of the sixth grade. 

3. The use of concrete materials in classes 
above the primary grades has not been strongly 
influenced by research findings as yet. From 
1900 to 1930, 96.5 percent of the texts examined 
did not recommend the use of concrete materials 
above the primary grades. Between 1930 and 1945 
many articles advocating such use appeared in 
professional journals and the examination of text- 
books showed that 82.8 percent still did not make 
any reference to these materials but that 17.1 
percent did. Since 1940, 39. 5percent of the texts 
examined have advocated the use of concrete ma- 
terials in classes above the primary grades. 
However, no research on this topic was published 
until 1950. The few studies that were located 
have reported that while such use has not retard- 
ed learning, no significant differences could be 
found. Examination of textbooks indicates that the 
trend to include such materials has continued to 
increase strongly since 1954. 


The Status of Research 


The role of research in the arithmetic curric- 
ulum development program has been an important 
one 

1. Many studies were designed to evaluate 
practices which had been in use for many years. 

2. When the recommendations were clear, 





concise and exact, they were incorporated into 
some textbooks within five years. 

3. When the recommendations were general, 
intangible, or based upon subjective data, they 
were not applied as rapidly as where research 
findings were precise and well supported by ade- 
quate data. The lack of clearness and explicit- 
ness in the presentation of the recommendations 
led to slowly developing trends. 

4. Those recommendations which were pub- 
lished in the yearbooks of the National Society for 
the Study of Education and the National Council of 
Teachers of Mathematics tended tobe applied 
very quickly. 
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A QUICK-BUT ACCURATE- APPROXIMATION 
TO THE STANDARD DEVIATION OF 
A DISTRIBUTION 


ROBERT L. LATHROP 
University of Minnesota 


Introduction 


TO THE person familiar withthe procedure, 
and having access to a desk calculator, the com- 
putation of the standard deviation of a distribution 
is not difficult. There are, however, many people 
who do not have one or both of these requisites 
but would still find the standard deviation a useful 
quantity if it could be easily and accurately ap- 
proximated. Teachers, for example, find the 
standard score an appealing unit for expressing 
test scores but will frequently not take the time or 
trouble to compute the standard deviation for that 
purpose alone. It was, in fact, this particular 
problem that led the writer to examine various 
procedures for approximating the standard devia- 
tion of a distribution of test scores. 


The Rationale 


After examining a number of alternative pro- 
cedures, a method for approximating the standard 
deviation mentioned briefly by Diederich (1) 
seemed to hold considerable promise. This meth- 
od expresses the variability of a distribution as 
the difference between two quantities drawn from 
the tails of thedistribution. The logic of this ap- 
proach is much the same as the argument under- 
lying the use ofthe range or the quartile deviation 
as a measure of variability. That is, for two dis- 
tributions, the one having the greater variability 
will have the greater difference betweentwo com- 
parable points on the distribution. 

In symbolic terms, this approximation of the 
standard deviation of a finite distribution can be 
expressed as: 


g = Wl 2Xu - 2X] (1) 





Where § is an approximation of the standard 
deviation s - =(X - X)* 
N 
is a constant determined by the pro- 
portion selected for the upper or low- 
er groups (the method for computing 
K will be described in a following 
paragraph). 
N is the total number of values in the 
distribution. 
=Xy is the sumof scores inthe upper por- 
tion of the distribution. 
=X], isthesumofscoresinthe lower por- 
tion of the distribution. 


For initial convenience, consider X as a vari- 
able which is normally distributed. In this case: 


DXy = Np[ s{ 3] +X] . (2) 


is an arbitrarily selected portion of 
the distribution to be included in the 
upper (or lower) group of scores. 

is the ordinate height at the point 
where the upper scores are divided 
from the remainder of the distribu- 
tion. 

are the mean and standard deviation 
of the distribution of X. 


Where p 


X and s 


Similarly eo 
=X) = Np[s{$] om « (3) 


If not apparent to the reader, (7 represents 


the average standard score in the upper group of 
scores, 2 Np the number of values (scores) in 
either extreme group, and s and X the quantities 
needed to convert the standard scores to raw 


1. For this finite example, the author has chosen to use the symbols X and s rather than the Greek let- 
ters 1. and o commonly associated with the infinite normal distribution. 


2. For a discussion of mean sigma distances, see Wert (2:67). 
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TABLE I 


COMPARISON OF s AND § FROM TEN ACTUAL TEST 
SCORE DISTRIBUTIONS 








Distribution 








l 


2 


FIGURE 1 


ILLUSTRATING THE RELATIVE INSENSITIVITY OF 8 TO 
DEVIATIONS AWAY FROM NORMALITY 
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score form. Since the normal distribution is 
symetrical, for equal values of p, the expressions 
for >Xy and YX) differ only in the sign of the av- 
erage standard score term. 

Returning to equation (1) and simplifying we 
find: 


K 
8 = [ 2Nps (2)] (4) 


or, S = 2Kys 


Since our intentionis for 8 = s ina normal dis- 
tribution, we find: 


2Ky = 1 
or, Ky = .500 


Because y is a function of p, the proposition can 
be restated in the following way: if p is selected 
as some value 0 < p= .5, K will be the quantity 

The choice of a value for p is a matter of con- 
venience in any practical problem and usually in- 
volves choosing enough values from the extreme 
scores to smooth out chance fluctuations while 
not taking so many values that the work becomes 
prohibitive. One point which the author has found 
convenient is to set p at .167. When pis one- 
sixth, y = .250 and K = 2. Thus, 8, the approx- 
imation of s, can be expressed as: the difference 
between the sums of the upper and lower one- 
sixths of anormal distribution divided by one-half 
the number of scores in the distribution. 


An Empirical Comparison of S$ and s 

In the case of anormal distribution, § is ident- 
ical with s. However, even when the distribution 
is not normal, 8 is still a good approximation to 
s. For comparative purposes anumber of actual 
test distributions were examined, with s comput- 
ed in the usual way 


| 





and then approximated by the short-cut formula 
proposed above. Table I presents the results of 
these comparisons. 

In no case was the error over eight percent 
and in most cases in the order of two percent or 
less. Although the comparisons for only ten dis- 
tributions are presented, they illustrate the order 
of differences bet ween § and s for a great many 
other comparisons which the author has made. 

To illustrate the relative insensitivity of § to 
deviations away fromnormality, three hypotheti- 
cal distributions were constructed: 1 rectilinear; 
2 bimodal; 3 highly skewed (see Figure 1). 

Distributions 1 and 2 haderrorsof -.04. Only 
in the extremely skewed distribution (error -.13) 
was § a questionable approximation of s. 


Conclusions 


In all cases where actual test score distribu- 
tions have been considered, § has been entirely 
satisfactory as an approximation of s. In comput- 
ing standard scores, for example, the usual er- 
ror is inthe order of two to three percent. Be- 
cause such an error is insignificant in all but the 
most demanding situations, the writer has found 
the approximation described here extremely ap- 
pealing to teachers who consider the more con- 
ventional approaches to computing s too c ompli- 
cated and/or too time consuming to warrant its 
calculation. 
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COMPARING TWO METHODS OF 
WEIGHTING A SET OF SCORES 


CHARLES E. HALL 


Walter V. Clarke Associates, Inc. 


ONE OF THE more common problems of ap- 
plied psychometrics is the problem of combining 
a collection of test or item scores into a single 
score. Often the psychometrist must choose be- 
tween the simple sum of scores and a weighted 
sum, where the weights are derived according to 
some optimum procedure, say scaling or regres- 
sion weights. In order to make a rational deci- 
sion, the psychometrist must either compute the 
composite scores for each subject and correlate 
the two composites or follow a procedure like that 
recommended by Gulliksen (1950, p. 316 et seq.). 
In the author’s experience, the psychometrist 
does not bother to go through the long, laborious 
computations, but rather selects from his own 
preferences, the weighting system to be used. 


Development 


Censider xjk, a set of scores on k subjects o- 
ver i measures. Let! = (1j) beonesetof weights 
which might be applied to these measures to ob- 
tain a composite score as follows: 

LX =1 Xi, + leXa_ + +++ 1X 


Suppose m = (mj) is another such set of weights. 


The correlation between lx and mx canbe written 
as follows: 


=, lx + mx - 
rlx, mx = 





(1/n)D)1x* Dmx 


, East Providence, R.I. 


= 242 1jm (2), XjKX k?: 

With the restrictions (2) above, 2KXiKX jk = Tiji 
and the numerator reduces to 2j2jljmjrj; =|1Rm; 
where R is the correlation matrix between the un- 
weighted items or test scores andlandm are 
treated as vectors. 

The denominator of equation (3) can be reduced 
by a similar process tov|1Rl'*v mRm' 

Equation (1) finally reduces to 

e 1Rm' 





‘lx,mx Jip Jimmi! 


Discussion 


The correlation gained from this procedure is 
somewhat analogous to a reliability coefficient 
and, to the author’s thinking, ought to be handled 
in the same way. Althoughthe sampling error of 
Tix, mx is that of the product moment correlation, 
the interest here is the replication of information 
for the two weighting systems, not the testing of 
a null hypothesis. Because ofthis, only high val- 
ues Of r}x mx Ought to be considered as accept- 
able. 

It is possible to test the ‘‘un-correlation’’ of 
two sets of weights by testing the significance of 
the coincident coefficient of alienation, 

mr a=vl- rx mx: 











Vv Dy (Ix)? = (1/n) (Z_ (mx)? v Oy(mx)* - (1/n) (2_mx)? 


Suppose we restrict the x scores so that 
DRX = 1 and U_x, = 0. (2) 
With this restriction a simplification occurs. 
First, Uylx = U2 j{ljXix 
= Djli(Z_Xik) 
= Zilj - 0=0, since 
UKXik = 0. 
Equation (1) then reduces to 
2 «lx: mx 
Tix, mx = . (3) 
ViK(Lx)? - V D_(mx)? 
Now let us consider the numerator of equation 
(3). 
Dj1x* mx = Zul (2 LiXix) * (E jm jXj_)] 
= Ek{ Dj jl jm jxixxjx] 











Two conveniences occur when using this tech- 
nique to handle data. 1) This procedure is insen- 
sitive to multiplying all the weights of 1 by a con- 
stant. 2) This procedure isinsensitive to the 
means of the scores. 

To show that the set of weights al = (al,, al,...) 
is comparable to 1 = (1,, lg...), we may merely 
substitute al for m in equation (4). 

rx, alx = __/ Bila) 

VIRI - v(al)R(la) 
alR1' = 1. 
av (IRI )? 

To show that the procedure is insensitive to 
means is more complex. First, consider the 
(weighted) covariance of 1(x+a) and 1x. 
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TABLE fl 


RELATIONSHIPS BETWEEN THE TWO SCORES 


» 











Factor 





Visualization 
Perceptual Speed 


opace 





Ul (x+a)lx - (1/n)X_l(x+a) * Up_lx = 

D(1x* lx+la* 1x) - (1/n){ 21x+Zla]- Dix = 

Dlx’ lx+Zlalx - (1/n)(Z1x)* - (1/n)ZDlazlx. 

Now la = 1, a, + lg2ag+...l,a, is aconstant; al- 
lowing the reduction 

D(1x)? - (1/n)(21x)? + ladlx - (1/n)n- la: =1x = 

E(1x)? - (1/n)(Z1x)*. 

Second, consider the (weighted) variance of 
1(x+a). 

El(x+a)*1(x +a) - (1/n){ 21(x+a)]* = 

D(1x)?+2E1x* la+Z(la)* - (1/n)(Z1x 
- (2/n)=1lxZla - (1/n)(Zla)? 

S(ix)* - (1/n)(Z1x)*+2laDlx - (2/n)nlaD1x+n(la)? 
- (1/n)n? (la)? = (1x)? - (1/n)(=1x)?. 

Thus the covariance of 1(x+a) is also equal to 
the variance of lx, showing that the correlation 
between |(x+a) and |x is 1. 

This technique, however, is not insensitive to 
the variance of the x scores. 


)? 


An Example 


The following example is cited to show how this 
technique can be used to simplify the computation 
of factor scores. Michael, Zimmerman and Guil- 
ford (1951) collected information about high school 
girls, using seven spatial relations tests. The 
writer factor analyzed the information using Can- 
onical Factor Analysis and rotated three factors 
to a meaningful orthogonal structure. The rotat- 
ed coefficients are presented in Table I. Subse- 
quently, the factor score coefficients regressing 
univariate test scores onto the factor scores were 
computed according to the procedure outlined by 
Holzinger and Harman (1941, p. 268, equation 
12.16). These also appear in Table I. Thepres- 
ent objective is to simplify these regression equa- 
tions by substituting simple integers or zeros for 





the decimals. 

One of the difficulties of the procedure is that 
it is sensitive to the variance of the scores. 
Since the factor score coefficients are associated 
with univariate scores and it is desiredto use in- 
tegral coefficients with raw scores, a little jug- 
gling is required. If LX denotes the combination 
of integral coefficients, L, and the rawscors, 
X, then (Lo)(4) adjusts the raw scores to univar- 
iate scores and the integral weights to comparable 
factor score coefficients simultaneously. 

The trick to developing comparable integral 
weights is to find products of integers and stand- 
ard deviations which are proportional to the re- 
gressicn coefficients. Table I also presents the 
standard deviations which are approximately pro- 
portioned to the factor score coefficients and the 
integral raw score weights. The use of these in- 
tegers makes it possible to compute reasonable 
approximations to the factor scores with an adding 
machine. Table II shows the correlation between 
the actual factor scores and approximations ob- 
tained from integral weights. Perhaps better in- 
tegral coefficients could be obtained from a more 
diligent search of the possibilities. 
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